arXiv:1503.06429v1 [stat.ML] 22 Mar 2015 hp ftedsrbtosue utb anand uhtha such maintained, be must used distributions the of shape Brazil 13083-852, SP Campinas, Campinas of Engineering University Computer and Electrical of School Zuben Von J. Fernando Miranda S. Conrado itiuin t rgnlitrrtblt sas lost. also Additionally, is cost interpretability the estimates. original its at likelihood distribution, maximum distribution, the normal for the symm does. sions of in 1976) skewness asymmetry Leonard, the introduce and control to (O’Hagan distribution possible normal also skew is it but 1994), e itiuin odsrb oedt euirt,adon and peculiarity, distribution. data used the some of distributions describe asymmetry underlying to the distributions by new limited still are they rltn iihe loain(lie l,20) hr s a where distribution. 2003), probability al., joint et models the (Blei Markov allocation hidden Dirichlet latent 1988), or mo Basford, (Kolle and the graphs (McLachlan fit probabilistic models of applicatio to use some the distributions distributions, requiring of directly, probability abundance of this with plethora even a is There Introduction 1. oke h nepeaiiy hc a eipratwe an when important be may which interpretability, the keep To hr r aual smercdsrbtos uha h l the as such distributions, asymmetric naturally are There hl hs oecmlxmdl rvd diinlflexibil additional provide models complex more these While rbe,weeahde akvmdli sdt tasokidx ind index, real stock entropy. a a less Keywords: in considerably fit with and to distributions distributio used one, learn transition may is and symmetric and likelihood model the higher Markov provides as version th hidden asymmetric well showing a as example, where least regression problem, at linear a performs in version asymmetr estimate compared The e likelihood are define maximum expressions. distributions to have closed-form shown known to with are and parameters, which priors, generalized distributions, conjugate create to known normal used with and is Laplace concept the new to of similar This shape domains. a has disjoint distribution and each where distrib distributions continuous of for mixture mixtures constrained introduces paper This oes aiu ieiodetmto,Mxuemodels Mixture estimation, likelihood Maximum models, smercDsrbtosfo osrie Mixtures Constrained from Distributions Asymmetric smercDsrbtosfo osrie Mixtures Constrained from Distributions Asymmetric smercpoaiiydsrbto,Epnnilfml,Hde Mark Hidden family, Exponential distribution, probability Asymmetric Abstract 1 n remn 09,lk mixture like 2009), Friedman, and r to itiuin sue obuild to used is distributions of et scnntb ovduigthem using solved be not can ns hsmtvtstesac for search the motivates This . [email protected] [email protected] ymdfigtesaeo the of shape the modifying by gomldsrbto Jhsne al., et (Johnson distribution ognormal h srcncos h ones the choose can user the t fpriua neeti the is interest particular of e flsn lsdfr expres- closed-form losing of t odsrb h problem, the describe to ity tos hrceie ya by characterized utions, hsdsrbto sal to able is distribution This cadsmercnormal symmetric and ic ti itiuin,lk the like distributions, etric tdvreue.However, uses. diverse st lzn te oe,the model, fitted a alyzing Bu n ere 1966), Petrie, and (Baum h aedistribution base the smercversions asymmetric oesoe states over models n pnnilfamilies, xponential tteasymmetric the at ol time-series world o h original the for s ctn htthe that icating ov Miranda and Von Zuben
he or she knows how to analyze. For instance, this is what happens with mixture models, where the known base distributions just change their parameters and are weighted. In this paper, we introduce the concept of a constrained mixture of distributions for continuous distributions, which differs from the traditional mixture in that, instead of each distribution being defined in the whole domain and being able to overlap with the other dis- tributions, the domain is partitioned among the distributions. In this way, they are defined only in their segment, and all of them are instances of the same underlying distribution with different parameters that guarantee that the continuity of the original distribution is kept. This allows weighting each segment and analyzing them separately, like one would do with the distributions in a standard mixture model. The constrained mixture is then used to create asymmetric versions of the Laplace and normal distributions, where the symmetric versions are particular cases. These new distribu- tions are shown to define an exponential family when the partitions are known, which allows them to be easily used in existing models designed to work with these kinds of distributions, like in latent Dirichlet models (Banerjee and Shan, 2007) and co-clustering (Shan and Banerjee, 2008), and their conjugate priors, with closed-form expressions, are also given. We also show for these new asymmetric distributions that, if the weight of each partition is known, then the maximum likelihood estimates are known and their closed-form expres- sions are provided. Furthermore, we provide a hill-climbing algorithm to fit the weight of the partitions, which allows maximum likelihood estimates for all the parameters. To show the power of introducing asymmetry to the normal distribution, two applica- tions are provided. The first is a simple linear regression example problem with asymmetric noise used to gain insight into how the asymmetry affects the estimation and show exper- imentally that the asymmetric likelihood is lower bounded by the symmetric likelihood. The second is a hidden Markov model used to fit a real world stock index time-series, which shows that the flexibility introduced by the asymmetry not only increases the likelihood, but may also provide insight into the system and reduce its entropy. This paper is organized as follows. Section 2 introduces the concept of constrained mixtures, and the asymmetric versions of the Laplace and normal distributions are intro- duced in Section 3. Section 4 proves optimality conditions for the maximum likelihood estimates and provides their closed-form expressions. Section 5 compares the performance of the asymmetric normal distribution with the symmetric version for one example and one real world problem, showing the advantages of the new distribution. Finally, Section 6 summarizes the findings and indicates future research directions.
2. Constrained Mixture
A constrained mixture is a special kind of mixture of distributions characterized by the existence of only one underlying distribution so that the domain is split in disjoint segments. Each segment has its own distribution, which must be similar to the base distribution, that is, there are known parameters for the base distribution that provide the shape of the distribution in the segment. Moreover, the distributions must be continuous and the weights for each segment must be provided. Since a mixture of N > 2 distributions can be described as a mixture of 2 distributions, where one of those is a mixture of N 1 distributions itself, we will develop the equations −
2 Asymmetric Distributions from Constrained Mixtures
only for the base-case of N = 2. This not only simplifies the problem, but also is associated with the number of distributions used to create the asymmetric versions of the Laplace and normal distributions.
Definition 1 (Constrained Mixture) Let φ(x; θ): R (θ) [0, ) be the continuous × D → ∞ probability density function (pdf) for some distribution D, where ( ) is the domain of its D · argument. Let φ (x; µ,θ) = φ(x; θ)I[x µ] and φ (x; µ,θ) = φ(x; θ)I[x<µ], where I[ ] is + ≥ − · the indicator function, be the partitions’ distributions. Let p (0, 1) be a weight parameter. ∈ Then the constrained mixture D∗ is described by a pdf ψ(x; µ,θ,p): R2 (θ) (0, 1) × D × → [0, ) that satisfies the following constraints for all µ, θ, and p in the domain: ∞ Constraint 1 (Continuity) The pdf is continuous at x = µ, which means that
lim ψ(x; µ,θ,p) = lim ψ(x; µ,θ,p). x→µ+ x→µ−
Constraint 2 (Mixture) There are known functions Θ (µ,θ,p): R (θ) (0, 1) (θ) ± ×D × → D and normalizing constant Z (0, ) such that ∈ ∞ ψ(x; µ,θ,p)Z = pφ (x;Θ (µ,θ,p)) + (1 p)φ (x;Θ (µ,θ,p)). − − − + + Constraint 1 guarantees that the continuity of φ( ) is preserved, while Constraint 2 · builds a mixture that forces each segment of the new pdf ψ( ) to have the same structure · as the original pdf φ( ), while also placing weight p and 1 p on the left and right sides of · − the partition, respectively. The functions Θ ( ) perform the mapping from the constraint ± · parameter µ and p and the underlying distribution parameters θ to a new set of parameters Θ±(µ,θ,p) that are used in each side of the partition. From Constraint 2 and the fact that ψ( ) is a pdf, two additional redundant constraints · can be defined, which will be used later to define auxiliary variables.
Constraint 3 (Volume) Since ψ( ) is a pdf, it has unitary volume: · ∞ ψ(x; µ,θ,p)dx = 1. Z−∞ Constraint 4 (Weighting) The mixture places weight p in the left part of the distribution, which can be written as: µ ψ(x; µ,θ,p)dx = p. Z−∞ The sampling of the new distribution D∗ can be performed by sampling u ([0, 1]) ′ ∼ U from the uniform distribution, followed by sampling from the distribution D− described by the non-normalized pdf φ ( ) if u