A Logistic Normal Mixture Model for Compositions with Essential Zeros

Item Type text; Electronic Dissertation

Authors Bear, John Stanley

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.

Download date 28/09/2021 15:07:32

Link to Item http://hdl.handle.net/10150/621758 A Logistic Normal Mixture Model for Compositions with Essential Zeros

by John S. Bear

A Dissertation Submitted to the Faculty of the Graduate Interdisciplinary Program in Statistics In Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy In the Graduate College The University of Arizona

2 0 1 6 2

THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE

As members of the Dissertation Committee, we certify that we have read the disserta- tion prepared by John Bear, titled A Logistic Normal Mixture Model for Compositions with Essential Zeros and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy.

Date: August 18, 2016 Dean Billheimer

Date: August 18, 2016 Joe Watkins

Date: August 18, 2016 Bonnie LaFleur

Date: August 18, 2016 Keisuke Hirano

Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College.

I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement.

Date: August 25, 2016 Dissertation Director: Dean Billheimer 3

STATEMENT BY AUTHOR

This dissertation has been submitted in partial fulfillment of the requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that an accurate acknowledgement of the source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author.

SIGNED: John S. Bear 4

Acknowledgments

Many people helped me in this quest. I would like to thank Eric Suess, Chair of Statistics and Biostatistics at California State University, East Bay. He gave me much good advice. I wish I had acted on more of it. Thanks to Joe Watkins, Head of the University of Arizona Graduate Interdisciplinary Statistics Program for, in equal parts, financial support, help with real analysis, and friendly discussions about a wide range of topics. My heartfelt thanks go to my committee, Joe Watkins, Bonnie LaFleur and Kei Hirano for all their help with the research process. I appreciate your knowledge, patience and humor. I would especially like to thank Dean Billheimer, my adviser, for so many things, for providing time to work on the dissertation, for teaching me how to be a statistician, for sending me to an international conference, for introducing me to reproducible research, and much more. In addition I would especially like to thank Dean for hiring me in the Statistical Consulting Lab, a great place to work, and to learn. I learned a great deal there, not only from him, but from my fellow students and colleagues. Thanks to Shripad Sinari, Isaac Jenkins, Julia Fisher, Kelly Yiu, Brian Hallmark, Ed Bedrick, James Lu, Anton Westveld, Phil Stevenson, Shannon Knapp, and Juli Riemenschneider. I am especially happy to say thanks to my sister, Kate, and my friends who buoyed me and helped me keep persisting. Bruce Eckholm, Libby Floden, Amy Howerter, Beth Poindexter, and Fred Lakin, thanks so much for your friendship and encouragement. 5

Table of Contents

Abstract ...... 7

Chapter 1. Introduction ...... 8

Chapter 2. Introduction to Analysis of Compositional Data .. 10 2.1. Useful Notation ...... 10 2.2. ...... 11 2.3. Additive Logratio Transformation ...... 13 2.4. Definition of the Logistic ...... 13 2.4.1. Matrices for Changing Parameterization ...... 14 2.4.2. The Variation Matrix ...... 15 2.4.3. Centered Logratio ...... 16 2.4.4. Strengths of the Logistic Normal Distribution ...... 16 2.5. Previous Approaches to Count Zeros ...... 17 2.5.1. Multinomial with Latent Logistic Normal - Billheimer, et al. . 17 2.5.2. Multinomial - Dirichlet - Martin-Fernandez et al...... 18 2.6. Previous Approaches to Continuous Zeros ...... 20 2.6.1. Treat as Rounded - Aitchison, Fry et al., Martin-Fernandez et al. 20 2.6.2. Latent Gaussian - Butler & Glasbey ...... 21 2.6.3. Latent Gaussian - Leininger et al...... 24 2.6.4. Mixture with Multiplicative Logistic Normal - Stewart & Field 24 2.6.5. Hypersphere Approaches - Stephens; Scealy & Welsh . . . . . 25 2.6.6. Multinomial Fractional Logit - Mullahy ...... 27 2.6.7. Variation Matrix - Aitchison & Kay ...... 29

Chapter 3. Mixture of Logistic Normals ...... 31 3.1. Simplifying Assumption ...... 31 3.2. Logistic Normal Mixture Model ...... 32 3.2.1. Multivariate Normal Foundation ...... 33 3.2.2. Common Expectations and Variances ...... 35 3.3. DataBlocks...... 36 3.3.1. Block Matrices of Compositions ...... 37 3.3.2. Transformations - Ratios and Logratios ...... 37 3.3.3. Means ...... 39 3.4. Simple Estimators ...... 39 3.4.1. Illustration - Spices, Lentils, and Rice ...... 39 3.4.2. Mean...... 41 3.4.3. Variance...... 41 3.4.4. Covariance ...... 42 3.5. Maximum Likelihood Estimators ...... 43 3.5.1. Likelihood...... 44 6

Table of Contents—Continued

3.5.2. Partial Derivatives ...... 46 3.5.3. MLE for Location, Given Ω ...... 46 3.5.4. Unbiasedness of Conditional MLE for 3-Part Composition . . 47 3.5.5. General Maximum Likelihood Estimators ...... 47 3.6. Variances of Location Estimators ...... 48 3.6.1. Variances of Location estimators ...... 48 3.6.2. Relative Efficiency of Location Estimators ...... 49 3.6.3. Summary of Relative Efficiency ...... 55

Chapter 4. Subcompositional Coherence ...... 56 4.1. Examples ...... 57 4.1.1. Naive Estimator, ˚µ ...... 57 ∗ 4.1.2. ALR˜ Estimator, µ ...... 58 4.2. Covariance Estimators ...... 59 4.3. More From the Literature ...... 60 4.4. An Alternative Characterization ...... 61

Chapter 5. Two-sample Permutation Test ...... 65 5.1. Introduction ...... 65 5.2. A Permutation Test ...... 66 5.2.1. TestStatistic ...... 67 5.2.2. Pooled Covariance, Sp ...... 67 5.2.3. Difference of Means, ∆ ...... 68 5.2.4. TestDefinition ...... 69 5.2.5. Procedure ...... 71 5.3. Comparison of Mixing Parameters ...... 72

Chapter 6. A Proteomic Application ...... 74 6.1. Serum Amyloid A Compositions in a Diabetes Study ...... 74 6.1.1. SAA Proteomic Data ...... 74 6.2. Mixing Parameters ...... 76 6.3. Variance...... 76 6.4. DataPlots...... 77 6.5. Permutation Test for Means ...... 79

Chapter 7. Conclusion ...... 80

Appendix A. Variances of Estimators ...... 81 A.1. Variance of Location MLE, µˆ ...... 81 ∗ A.2. Variance of Simple Location Estimator, µ ...... 83

Appendix B. Serum Amyloid A Data ...... 85

References ...... 87 7

Abstract

Compositions are vectors of nonnegative numbers that sum to a constant, usually one or 100%. They arise in a wide array of fields: geological sampling, budgets, fat/protein/carbohydrate in foods, percentage of the vote acquired by each political party, and more. The usual candidate distributions for modeling compositions — the Dirichlet and the logistic normal distribution — have density zero if any component is zero. While statistical methods have been developed for “rounded” zeros, zeros stemming from values below a detection level, and zeros arising from count data, there remain prob- lems with essential zeros, i.e. cases in continuous compositions where a component is truly absent. We develop a model for compositions with essential zeros based on an approach by Aitchison and Kay(2003). It uses a mixture of additive logistic normal distributions of different dimension, related by common parameters. With the requirement of an additional constraint, we develop a likelihood and methods estimating parameters for location and dispersion. We also develop a permutation test for a two-group comparison, and demonstrate the model and test using data from a diabetes study. These results provide the first use of the additive logistic normal distribution for modeling and testing compositional data in the presence of essential zeros. 8

Chapter 1

Introduction

A composition is a vector of nonnegative values that sum to a constant, typically one or 100%. For instance the percentage of protein, fat, and carbohydrate of a food, the proportions of different minerals composing a geological sample, and measurements of the proportions of different forms of a protein in a sample of blood are compositions. Compositions are used in cases where the relative abundances are of interest, and often, the total abundance is not. Because the parts of a composition sum to a constant, they are not independent. Methods for analysis need to account for the sum constraint. There are two distributions frequently used for compositional data analyses, the Dirichlet, and the logistic normal. In both distributions, a zero component entails zero density. The logistic normal distribution involves transforming a composition by taking logs of ratios of the parts, but log(0) is not defined. The

αi−1 Dirichlet probability density involves a product of terms xi , so if one of the xi = 0, the value of the density is zero. It can happen though, when dealing with real data, that one encounters zeros. The analyst faces a choice of whether to model zero as a legitimate value, or to view it as the result of rounding, or instrumental failure to detect, and to replace zeros with some small positive value. The phrase essential zero has been used (Aitchison (1986), Aitchison and Kay(2003)) to refer to zeros that the analyst wants to treat as the actual value. Aitchison and Kay(2003) wrote, “By an essential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part.” Note that whether something is an essential zero is 9 a judgment made by the analyst. A machine with a detection limit of n parts per billion could output zero for cases of less than n parts per billion. One must decide whether to treat zero as the actual value, or as a stand-in for some small unknown value below the level of detection. This dissertation presents an approach for modeling zeros in compositional data. In Chapter 2 we discuss analysis of compositions in general, and approaches that have been proposed for components with zeros. In Chapter 3 we propose a new model extending ideas of Aitchison and Kay(2003) and define parameter estimators. The model is a mixture of logistic normal distributions of different dimensions, but with a shared mean vector and covariance matrix. In chapter 4 we discuss the issue of subcompositional coherence which is referred to in the literature as desirable property for a model, but which is not well understood in the context of zeros. We propose new conditions defining subcompositional coherence. In Chapter 5 we develop a two-group permutation test of means. In Chapter 6 we illustrate the use of the estimators and the test on data from a study involving distributional differences in serum amyloid A from human blood samples among patients with and without type 2 diabetes. In Chapter 7 we wrap up with some discussion and conclusions. 10

Chapter 2

Introduction to Analysis of Compositional Data

Two commonly used distributions for modeling compositional data are the Dirichlet, and the logistic normal. We will describe both, briefly, and argue for continuing the approach put forth by Aitchison and Kay(2003) for extending the logistic normal.

2.1 Useful Notation

First we present some basic definitions and notation about compositions, from Aitchi- son(1986), before proceeding. Convention: We use D for the number of parts in a composition, and d = D − 1. Definition: If

w = (w1, w2, . . . , wd, wD), and wi ≥ 0, then w is called a basis. Definition: If

D X x = (x1, x2, . . . , xd, xD), and xi ≥ 0, and xi = 1. i=1 then x is called a composition. Definition: The d-dimensional simplex embedded in D-dimensional real space is the set defined by

D d X S = {(x1, . . . , xD): x1 > 0, . . . , xD > 0; xi = 1}. i=1 11

Definition: The closure operation, C , transforms a basis into a composition. Let

D X w = (w1, w2, . . . , wd, wD), where wi ≥ 0, and t = wi. i=1

Then x = C (w) = w/t is a composition. Sometimes we refer to closure as renormal- ization.

2.2 Dirichlet Distribution

A common distribution for compositional data is the Dirichlet distribution. The details of the Dirichlet distribution are listed here as a convenient reminder, from

d Kotz et al. (2004), pp. 487-488. Let x = (x1, x2, . . . , xD) ∈ S with D ≥ 2. The Dirichlet density function is:

D QD 1 Y Γ(αi) f(x) = xαi−1, where B(α) = i=1 , B(α) i PD i=1 Γ( i=1 αi)

and where α = (α1, . . . , αD), and αi > 0 for 1 ≤ i ≤ D.

α The mean is: E(X ) = i . (2.1) i PD i=1 αi

D αi(α0 − αi) X The variance is: Var(X ) = , where α = α . (2.2) i α2(α + 1) 0 i 0 0 i=1

−αiαj The covariance is: Cov(Xi,Xj) = 2 (i 6= j). (2.3) α0(α0 + 1)

The Dirichlet distribution has the desirable property that it deals well with aggre- gations. If one wants to group k components together into a single category, the 12

parameter of the new super category is derivable as the sum of the parameters of the individual components it comprises. The example below shows how to combine the ith and jth components of a Dirichlet distribution and get the parameterization for the new distribution.

If X = (X1,...,Xi,Xj,...,XD) ∼ Dir(α1, . . . , αi, αj, . . . , αD), then (2.4)

0 X = (X1,...,Xi + Xj,...,XD) ∼ Dir(α1, . . . , αi + αj, . . . , αD). (2.5)

There are two main disadvantages to using the Dirichlet distribution, articulated by Aitchison(1986). First, the covariance between any two components of a Dirichlet random vector is negative, fixed, and determined completely by the other parameters (Equation 2.3). The Dirichlet distribution cannot model a composition which has a positive covariance between two of the components. More generally, the Dirichlet does not have anything like the rich covariance structure that is available in the multivariate normal distribution. The second disadvantage of the Dirichlet distribution is that inferences about subcompositions are not necessarily consistent with inferences based on the whole composition. The idea that inferences about subcompositions be consistent with inferences based on the whole composition has been called subcompositional coherence (Aitchison and Egozcue(2005) and many others), and we will have much more to say about it in Chapter 4. In response to his objection to these two aspects of the Dirichlet distribution, Aitchison developed the logistic normal distribution. 13

2.3 Additive Logratio Transformation

The definition of the logistic normal distribution hinges on the additive logratio trans- formation, defined here.

Definition: If x = (x1, x2, . . . , xd, xD), then x−D = (x1, x2, . . . , xd). Definition: The additive logratio transformation, alr, is defined as follows:

d d alr : S → R

x 7→ y =(log(x1/xD), log(x2/xD),..., log(xd/xD)). (2.6)

We define the shorthand:

log(x−D/xD) = (log(x1/xD), log(x2/xD),..., log(xd/xD)).

Since alr is one-to-one, its inverse exists. It is called the logistic transformation, alr−1, defined as:

xi = exp(yi)/{exp(y1) + ··· + exp(yd) + 1} (i = 1, . . . , d), (2.7)

xD = 1 − x1 − x2 − ... − xd (2.8)

= 1/{exp(y1) + ··· + exp(yd) + 1}. (2.9)

2.4 Definition of the Logistic Normal Distribution

Definition: A D-part composition x with strictly positive components has an ad-

d d ditive logistic normal distribution L (µ, Ω) if y = log(x−D/xD) has a N (µ, Ω) distribution. The density function for the logistic normal distribution is derived by transform-

ing the normal density. If x is a D-place composition and y = log(x−D/xD), then 14

−1 jac(y|x−D) = (x1 ··· xD) , and the density function fx is:

− d − 1 −1 fx(x; µ, Ω) =(2π) 2 |Ω| 2 (x1 ··· xD) 1 exp{− (log(x /x ) − µ)0Ω−1(log(x /x ) − µ)}. (2.10) 2 −D D −D D

An important fact about the logistic normal distribution is that the likelihood is invariant to the order of the components in the composition, assuming there are no zeros in any of the compositions. This result is proved in Aitchison(1986) pages 118-119, relying on results in pages 93-98. There are two other parameterizations for this distribution, which we describe next for completeness, but all of our work revolves around the definition based on the additive logratio transformation. It is often referred to as the alr parameterization. It is the one which makes the most direct link to the full rank multivariate normal distribution. The other two are derivable from it and each other via matrix algebra, given next.

2.4.1 Matrices for Changing Parameterization

To transform one parameterization into another requires the matrices in the table below, which is Aitchison’s Table 4.4, p. 81, with three additional rows added at the bottom. They are used in the next section. 15

Notation Definition Order Rank

Id identity matrix d × d d

Jd matrix of units d × d 1

jd column vector of units d × 1 1

Fd,D [Id : −jd] d × D d

−1 GD ID − D JD D × D d

Hd Id + Jd d × d d

th ei i unit vector D × 1 1

0d column vector of zeros d × 1 1

Md,D [Id : 0d] d × D d

2.4.2 The Variation Matrix

One of the parameterizations of the logistic normal distribution is based on the logs of ratios of all possible pairs of components. The location parameter is a matrix, denoted ξ, and the variation parameter matrix, T. For a composition x, if x =

(x1, x2, . . . , xd, xD), then for i, j = 1, 2,...,D:

ξ = [ξi,j], where ξi,j = E{log(xi/xj)}, and

T = [τi,j], where τi,j = Var{log(xi/xj)}. As can be seen below, both matrices have zeros on the main diagonals because log(xi/xi) = 0. In addition, since log(xi/xj) = − log(xj/xi), the lower diagonal elements of ξ are just the negatives of the upper diagonal elements.

    0 ξ1,2 ξ1,3 ··· ξ1,d ξ1,D 0 τ1,2 τ1,3 ··· τ1,d τ1,D          −ξ1,2 0 ξ2,3 ··· ξ2,d ξ2,D  τ1,2 0 τ2,3 ··· τ2,d τ2,D      . .   . .   . .   . .   −ξ1,3 −ξ2,3 0 ··· . .   τ1,3 τ3,2 0 ··· . .  ξ =   . V =   .  ......   ......   . . ··· . . .   . . ··· . . .          −ξ −ξ ······ 0 ξ  τ τ ······ 0 τ   1,d 2,d d,D  1,d 2,d d,D     −ξ1,D −ξ2,D · · · · · · −ξd,D 0 τ1,D τ2,D ······ τd,D 0 16

In the case where there are no zeros, it is possible to map between these two param- eterizations. In terms of T the covariance matrix Ω is

−1 Ω = FTF0. (Aitchison(1986) pp. 82-83.) 2

For the location, µ is the last column of ξ, but without the zero at the bottom,

µ = MξeD.

2.4.3 Centered Logratio Covariance Matrix

Another parameterization involves centered logratios, z,

1/D z = log(x/g(x)), where g(x) = (x1x2 . . . xD) . (2.11)

The centered logratio covariance matrix, Γ, is

Γ = [γij] = [Cov[log(xi/g(x)), log(xj/g(x))] : i, j = 1, 2,...,D]. (2.12)

It is related to Ω by the equation, Γ = FT H−1ΩH−1F. It has the undesirable attribute that it is not full rank, hence not positive definite. For all of our work, we use the Ω parameterization because, of the three, only Ω is a positive definite covariance matrix.

2.4.4 Strengths of the Logistic Normal Distribution

Two main strengths of the logistic normal distribution are the flexible covariance structure, and the wealth of resources available for the normal distribution, both theoretical and computational. 17

2.5 Previous Approaches to Count Zeros

Compositional data that arise from counts have been treated with different Bayesian approaches.

2.5.1 Multinomial with Latent Logistic Normal - Billheimer, et al.

Billheimer et al. (2001) report on a study involving counts of arthropods (spiders) on different plots with different treatments in an area on Mount St. Helens. They propose a hierarchical model for count data. They posit a conditional for the counts, with the event probability parameter, z, being a compo-

sition with a prior density that is logistic normal, z∼L (µ, Σ). In addition they use

a normal hyperprior for µ, µ∼N (η, Ω), and an inverse Wishart hyperprior for Σ,

−1 −1 Σ ∼Wishart(Ψ , ρ). The constants for the hyperpriors are chosen to be diffuse, yet still give a proper posterior distribution. The constants are:

η = 0k−1

Ω = aN

T N = Ik−1 + jk−1 jk−1

a = .5

Ψ = cN

c = .1

ρ = k − 1

The observed multinomial count data can contain zeros. The event probability param-

th eter cannot. Suppose the i observation is a k−place multinomial (yi1, yi2, . . . , yik),

where i ∈ {1, 2, . . . , n} if there are n data points, j ∈ {1, 2, . . . , k}, and the yij are 18

integer counts, i.e., yij ∈ {0, 1, 2,...}. Then the likelihood × prior is:

n k n Y  Y (yij −1) −1/2 P (z, µ, Σ|y) ∝ zij |Σ| i=1 j=1 1 o × exp  − (alr(z ) − µ)T Σ−1(alr(z ) − µ) 2 i i 1 × |Ω−1/2| exp  − (µ − η)T Ω−1(µ − η) 2 1 × |Ψ|ρ/2|Σ|−(ρ−k)/2 exp  − tr(ΨΣ−1) 2

They use MCMC to estimate unknown parameters. Their approach deals well with zeros provided the data are discrete counts, but it is not clear how to extend the approach to deal with continuous data.

2.5.2 Multinomial - Dirichlet - Martin-Fernandez et al.

Mart´ın-Fern´andez et al. (2015) propose a method for replacing compositional zeros from count data with a small number. This is a refinement of an approach described in Daunis-i-Estadella et al. (2008), with some of the same authors, so we only discuss the recent paper. The authors start with a Bayesian model with a multinomial likelihood and an assortment of Dirichlet priors: Perks, Jeffreys, Bayes-Laplace, square root, and a new one they develop, which they abbreviate GBM, for Geometric Bayesian Multiplicative. The value used to substitute for zero is the expectation of the Bayesian posterior for the relevant part. The details follow.

With a Dirichlet prior with parameters D and α = (α1, . . . , αD), they rewrite α as

a product of a strength, s, and a composition, t = (t1, . . . , tD). α = st. In addition,

a data point c = (c1, . . . , cD), is a vector of counts for each of the D categories, and P they let n = j cj. With this notation, the posterior density function is proportional 19

QD cj +s·tj −1 to j=1 πj and so the posterior Bayesian expectation is

c + s · t E[π |c] = j j . (2.13) j n + s

1 1 They investigate a variety of priors where s varies, and t = ( D ,..., D ). Substituting s, t, and cj = 0 into the expectation (2.13) gives different possible values to use as replacements for zero. The values for s are calculated to correspond to priors from

D the literature: Haldane: s = 0; Perks: s = 1; Jeffreys: s = 2 ; Bayes-Laplace: s = D; √ Square root (SQ): s = n. Their replacement model is to replace the composition

xi = ci/ni, by the vector ri = (ri1, . . . , riD), where

  si tij · , if xij = 0,  ni+si rij = P si xij · (1 − tik · ), if xij > 0.  ni+si  k|xik=0

This replaces the zeros by their posterior Bayesian estimates, E[πj|c]. This scheme preserves ratios of parts so that

D rij xij X = ; r = 1. (2.14) r x ij ik ik j=1

They then modify this approach, augmenting it by using a leave-one-out method to consider the data in the other compositions when calculating a posterior estimate to replace zero. With N different compositions from counts, they define

N X αij α = c and t = =m ˆ , (2.15) ij kj ij D ij k=1 P k6=i αik k=1

so ti = mˆ i = (m ˆ i1,..., mˆ iD). They then revise their substitution scheme, with

Q 1/D gi = ( j xij) , so that 20

  mˆ ij tij · , if xij = 0,  gi·ni+1   !  P rij = mˆ ik (2.16) k|xik=0 xij · 1 −  , if xij > 0.   gi·ni+1  

They call the replacement strategy in Equation (2.16), a Geometric Bayesian Multi- plicative strategy, and abbreviate it GBM. Their strategy is to impute values for count zeros. They say (p. 136), “In any case, it is clear that strategies for replacing the essential zero by a small value are not appropriate.” Our approach is to model essential zeros.

2.6 Previous Approaches to Continuous Zeros

2.6.1 Treat as Rounded - Aitchison, Fry et al., Martin-Fernandez et al.

One approach to essential zeros is to treat them like rounded zeros. Aitchison(1986) outlined the following approach which others have used, e.g., Fry et al. (2000). Assume the data have a logistic normal distribution, L (µ, Ω), and assume that if it were not for rounding, all the values would be strictly positive. Suppose a D-place composition has C elements which are rounded down to zero. Further suppose that the maximum rounding error is δ. Let  = δ(C + 1)(D − C)/D2.

Transform the composition by adding  to each of the zeros, and subtracting C/(D− C) from each nonzero component. Subtracting C/(D − C) ensures the same amount is subtracted from the composition as added. Aitchison(1986) does not explain why he chose  = δ(C + 1)(D − C)/D2, but the formula has the property that at the

1 1 1 extremes, C = 0 and C = D−1,  = D δ. When C takes on other values, D δ ≤  ≤ 2 δ. 21

1 1 Adding some fraction of the maximum rounding error between D δ and 2 δ seems not unreasonable. Mart´ın-Fern´andez et al. (2011) pursue a similar approach but improve on it in two ways, first with a multiplicative substitution, and then with an EM algorithm. In their multiplicative substitution method, if x is a D-place composition with C zero components, each zero component is replaced by δ, and each nonzero component, xi,

is replaced by a slightly smaller amount: (1 − Cδ)xi. They also stipulate that δ must be less than the maximum rounding error. Their EM algorithm relies on the data being transformed by the additive logratio

th transformation. If the i composition is xi, then Yi = [yij] = log(xi−D/xiD). They

say, “if a below-detection values occur [sic] when xij < ij, where ij is the detection

limit, its alr-transformed value yij is missing data in Y. The unknown value yij then

verifies yij < ln(ij/xiD) = ψij,” (p 47). They specify the replacement this way,

th “Then, on the t iteration, the modified E step replaces the values yij in the data set by

  (t)  yij if yij ≥ ψij, yij = (2.17) (t) (t)  E[yij|yi,−j, yij < ψij, µ , Ω ] if yij < ψij,

where yi,−j denotes the set of observed variables for the row i of the data matrix Y.” They make explicit their assumption that they need at least one component to be free of zeros (p. 48). We will come back to this assumption later. The approach described here is an imputation method for cases where the zeros are rounded or below the level of detection. The rest of the papers discussed below are about approaches to essential zeros.

2.6.2 Latent Gaussian - Butler & Glasbey

Butler and Glasbey(2008) propose a latent Gaussian model for compositions with 22

essential zeros. Their model, by their own admission, only works for 2- or 3-place compositions. It is complicated and cumbersome for 3-place compositions, and does not work at all for compositions of higher degree. Below is a description of their account for 2-place compositions.

If x = (x1, x2) is a 2-place composition, then x1 + x2 = 1 and 0 ≤ x1 ≤ 1, and 0 ≤

x2 ≤ 1. They model x1 with a Gaussian random variable z1 where z1∼N (µ, σ),

0 < µ < 1, and let z2 = 1 − z1.

Case 1 If 0 < z1 < 1 then (z1, z2) is a proper composition, and the mapping from z to x is simply z = x.

Case 2 If z1 < 0, then z1 is mapped to 0 and z2 is mapped to 1.

Case 3 If z1 > 1, then z1 is mapped to 1 and z2 is mapped to 0.

The log likelihood is partitioned according to these same three cases.

Case 1 If 0 < z1 < 1 then z1 = x1 and the log likelihood of the single point x1 is the log likelihood of a normal distribution:

log(2π) 1 x − µ2 − log(σ) − − 1 . (2.18) 2 2 σ

If we have multiple points (x11, x12), (x21, x22),... (xk1, xk2) where 0 < xi1 < 1,

log(2π) we multiply the term − log(σ) − 2 by the cardinality of M, ||M||, where

M = {i : 0 < xi1 < 1}. In that case the contribution to the likelihood from

points xi1 where i ∈ M is

   2 log(2π) X 1 xi1 − µ −||M|| log(σ) + − . (2.19) 2 2 σ i∈M

For the next two cases, we let Φ denote the cumulative distribution function of the standard Gaussian distribution. 23

Case 2 If z1 < 0, then z1 is mapped to x1 = 0 and z2 is mapped to x2 = 1. In this

case, assuming z1∼N (µ, σ) with 0 < µ < 1, all the probability mass from −∞

to 0 is assigned to the point x1 = 0. If we have a collection of data points of

the form (xi1, xi2) = (0, 1) then we let L = {i : xi1 = 0}, and the contribution

to the likelihood from all the points where xi1 = 0 is

 −µ ||L|| log Φ . (2.20) σ

Case 3 If z1 > 1, then z1 is mapped to x1 = 1 and z2 is mapped to x2 = 0. In this

case, assuming z1∼N (µ, σ) with 0 < µ < 1, all the probability mass from 1 to

∞ is assigned to the point x1 = 1. If we have a number of data points of the

form (xi1, xi2) = (1, 0) then we let U = {i : xi1 = 1}, and the contribution to

the likelihood from all the points where xi1 = 1 is

 1 − µ ||U|| log 1 − Φ . (2.21) σ

Adding the three cases together gives the log likelihood for their distribution.

 −µ  1 − µ log l(µ, Σ|y) = ||L|| log Φ + ||U|| log 1 − Φ σ σ    2 log(2π) X 1 xi1 − µ −||M|| log(σ) + − . (2.22) 2 2 σ i∈M

The authors admit that their model, “violates two of the principles – scale invariance and subcompositional coherence – that models for data on the interior of the simplex should adhere to” (p. 515). 24

2.6.3 Latent Gaussian - Leininger et al.

In contrast to Butler and Glasbey(2008), Leininger et al. (2013) have developed a more practical treatment of compositions as coming from a latent Gaussian random variable where the compositional component is zero when the latent Gaussian compo- nent is less than or equal to zero. They develop a hierarchical power model with the

γ (max(0,Zk)) th transformation Xk = Pd γ where Zk is the k normal component and 1+ k0=1(max(0,Zk0 ))

Xk is the corresponding compositional component. D is the number of parts in the

Pd 0 γ −1 composition, d = D − 1, and XD = (1 + k0=1(max(0,Zk)) ) . The corresponding

1/γ inverse transformation is Zk = (Xk/XD) if Xk > 0, and Zk ≤ 0 (latent) if Xk = 0, for k = 1, 2, . . . , d. To estimate parameters they use MCMC. They considered 1,2, and 3 as possible values for γ and conclude that γ = 2 works best. One limitation of their approach is also a limitation of ours: we require one component of the composition to be strictly positive.

2.6.4 Mixture with Multiplicative Logistic Normal - Stewart & Field

Stewart and Field(2011) deal with predator/prey relationships by looking at fatty acid signatures, which are compositions with essential zeros, p = (p1, p2, . . . , pk). They use the different possible patterns of essential zeros to partition the whole space of compositions. Then within each cell of the partition in which pk is strictly positive, they consider the distribution of log(pk/(1 − pk)). They make the explicit assumption that the distribution of log(pk/(1 − pk)) is the same for all the cells in the partition which have nonzero pk. About the specific details of their approach, they say:

“Let V denote the vector of indices indexing the non-zero components

of p and let pV denote the vector containing the non-zero components of p. ... To model p, it is assumed that there are separate populations

for every possible value of V. Let θv = P [V = v] be the marginal 25

probability that an observation comes from the population with non-zero PB components indexed by v, where b=1 θvb = 1 and B denotes the number of populations.”

They then construct a univariate normal distribution for each component of the com- position. Their distribution is:

 P  θvb if pk = 0,  {b:k∈ /vb}  P 2 fk(pk) = (1 − θv )M(µk, σ ) if 0 < pk < 1, (2.23)  {b:k∈ /vb} b k   0 otherwise.

2 Here M(µk, σk) stands for multiplicative logistic normal distribution, and means:

log(pk/(1 − pk))∼N (µk, σk). (2.24)

They also consider using a in the line above, which some- times fits their data better. A potential weakness of this approach, though it works well for their problem, is that in their words they, “are then not utilizing any information provided by the other components. . . ” (p.51, Stewart and Field(2011)). Their goal is to get a uni- variate marginal distribution for each component. It isn’t to get an estimate of the covariance structure. Though they can compute the joint distribution of any two components, and hence the covariance between any two components, computing the entire covariance matrix is unlikely to be doable with their approach.

2.6.5 Hypersphere Approaches - Stephens; Scealy & Welsh

n 2 2 2 Stephens(1982) used the fact that in R , y1 +y2 +...+yn = C, where C is a constant, is an equation describing the surface of a hypersphere. He pointed out that if we have 26

P a composition u = (u1, . . . , up) where ui = 1, we can use the transformation √ yi = ui to transform the sample space of the composition to points on the surface of a hypersphere. Building on that, Scealy and Welsh(2011) propose a regression model for compositions based on the Kent distribution for directional data. One of the main motivations is that, unlike the logistic normal distribution, zero components are within the domain of the Kent distribution and square root transformation. The density for the Kent distribution is:

p −1 X 2 f(y) = c(κ, β) exp{κγ1 · y + βj(γj · y) } (2.25) j=2

Pp where j=2 βj = 0 and 0 ≤ 2|βj| < κ, and the vectors {γj|j = 1 . . . p} are orthonor- mal. (The normalization constant, c(κ, β)−1, is not defined in the paper. They use MCMC methods and so don’t need it.) For the density, conditional on a covariate x, they use:

p−1 −1 T X T 2 T 2 f(y|x) = c(κ, β) exp[κµ(x) y + βj{γj(x) y} − (β2 + ... + βp−1){γp(x) y} ]. j=2 (2.26)

They assume that the components of µ(x) are not 0, and point out (p. 356) that that is a weaker assumption than what is required by the logistic normal distribution, namely, that the components of u are not 0. Their main goal is to show that using the Kent distribution for doing regression on compositional data containing essential zeros works provided the number of zeros is quite small, e.g. a set of 30 4-place compositions (120 total components) of which 5 components are zero. They do not consider a possibility of a probability mass for the point zero, and they make it clear that if there are very many zeros, they suggest a different approach which they have not yet worked out, the folded Kent distribution. 27

2.6.6 Multinomial Fractional Logit - Mullahy

Mullahy(2010) proposes a regression method for estimating means of compositions where the possible values of the components include the boundaries, zero and one. The goal is estimation of compositional means, not proposal of a distribution for compositions. The method uses multinomial quasi-likelihood and is intended for general applications with essential zeros.

xt = (xt1, xt2, . . . , xtD) is a composition, 0 ≤ xtk ≤ Bt. (2.27)

wt = (wt1, wt2, . . . , wtp) is a vector of covariates. (2.28)

P r(xtk = 0|wt) > 0 and P r(xtk = Bt|wt) > 0. (2.29)

D X xtk = Bt. (2.30) k=1

For the conditions below, β is a p × D matrix of parameters, with the ith column of

th β being a parameter vector for the i component of the composition; and ξk(w; β) is a function whose value is the mean of the kth component of the composition. These functions, ξk, are defined below. They have the property that they are strictly positive and sum to one. About the paper’s goal, Mullahy says (p. 7):

“The central goal of this paper is to provide consistent estimation strategies to estimate properties of the conditional distribution of share data that enforce (12) and (13) and accommodate (14) and (15):

E[xk|w] = ξk(w; β) ∈ (0, 1), k = 1,...,D. (12) (2.31)

D X E[xk|w] = 1. (13) (2.32) k=1

P r(xk = 0|w) ≥ 0, k = 1,...,D. (14) (2.33) 28

P r(xk = 1|w) ≥ 0, k = 1,...,D, (15) (2.34)

where β = [β1, β2,..., βD−1, 0].” (2.35) p×D

exp(wβ ) ξ (w; β) = k , k = 1,...,D − 1 (2.36) k D−1 P 1 + exp(wβm) m=1

1 ξ (w; β) = , k = 1,...,D − 1 (2.37) D D−1 P 1 + exp(wβm) m=1

th Suppose there are N data points. If the t data point is xt. = (xt1, xt2, . . . , xtD),

th and the vector of covariates for the t data point is wt. = (xt1, xt2, . . . , xtp), then the quasi-likelihood used for estimating the parameters, β, is the multinomial logit quasi-likelihood:

N D Y Y xtm L(β) = ξm(wt; βm) . (2.38) t=1 m=1

It uses the logistic functions for the component means, ξk(·), as the ‘probabilities’, with exponents, the observed shares xtk ∈ [0, 1], in place of integer counts in a multi- nomial density. Mullahy argues that consistency of this quasi-likelihood follows from Papke and Wooldridge(1996, 2008) and Gourieroux et al. (1984b,a).

Estimators are in the unit interval: 0 < ξm\(wt; β) < 1. For xk, the curve y =

xk ξk\(wt; β) is monotonic decreasing, as seen in the graph in Figure 2.1. We display the curves for selected values of ξk\(wt; β): 0.01, 0.2, 0.7, 0.99. (In the legend, we shorten ξk\(wt; β) to ξk.) The maximum values of the curves are at xk = 0, and their minimum values are at xk = 1. They don’t admit the possibility of a maximum value 29 away from the boundary. 1.0 ξk = 0.99 0.8

ξk = 0.7 0.6 k s k ξ 0.4

0.2 ξk = 0.2

y = ξsk k ξk = 0.01 0.0

0.0 0.2 0.4 0.6 0.8 1.0

sk

Figure 2.1: Monotonic Quasi-Likelihood Curves

Mullahy’s paper presents a way of estimating means of compositions based on a pseudo-likelihood. It does not propose a model for the distribution of composi- tions. That goal is different from ours. Ours is to develop a distribution model for compositions with zeros.

2.6.7 Variation Matrix - Aitchison & Kay

In a very brief paper, Aitchison and Kay(2003) outlined an approach to the essential zeros problem. Their idea is to use a mixture model where one part is an incidence vector, u, indicating where the zeros occur, and the second part is a logistic normal. 30

Their model for the presence of zeros is:

D Y ui 1−ui b(u|θ) = θi (1 − θi) {ui = 0, 1 (i = 1, ..., D)}. (2.39) i=1

This is the density function of D independent Bernoulli variables with success proba- bility θi. They go on to say, “At the second stage nonzero components are generated by subcompositional density functions based on a logistic normal density function

ψ(x|ξ, T) associated with the full composition; in other words x follows a L D(ξ, T) distribution.” The expression they give as the likelihood is this.

N Y L(θ, ξ, T|data) = p(un|θ)ψ(xJ(un)|ξJ(un), TJ(un)) (2.40) n=1 where

D Y uni 1−uni p(un|θ) = θi (1 − θ) . (2.41) i=1

xJ(un) is the subcomposition of x having only positive components. Similarly, ξJ(un) and TJ(un) are the corresponding subvector and submatrix of ξ and T from Sec- tion 2.4.2. Aitchison and Kay do not specify ψ, and do not show how to estimate parameters. Our model is based on this approach. We make a simplifying assumption which allows us to specify the in terms of Ω and µ, and allows use of the logistic normal density. In addition we derive maximum likelihood and heuristic parameter estimators, define a two group test of means, and apply them to a set of real data from a diabetes study. 31

Chapter 3

Mixture of Logistic Normals

3.1 Simplifying Assumption

In this section we outline our method for building a mixture distribution for modeling compositions containing essential zeros. We leave details about the mixing weights for Section (6.2). A key simplifying assumption we make throughout is that the Dth component of the composition is positive. We allow zeros in other components, 1, 2, . . . , d. (Recall D = d + 1.) This assumption is common to the work of both Mart´ın-Fern´andez et al. (2011) and Leininger et al. (2013). Without zeros, the logistic normal likelihood has been shown to be permutation invariant (Aitchison(1986)). In our extension allowing zeros, the likelihood is in- variant to the choice of the component chosen as the reference, provided the same reference part is used throughout the model.

T Let x = (x1, x2, . . . , xd, xD) be a composition with xi < 1 for all i ∈ {1, 2, . . . , d,

D} and xD > 0. Let W = {i : i ∈ {1, 2, . . . , d}, xi > 0}, the set of indices for the parts

d of x (other than xD) which are nonzero. There are 2 − 1 possible sets W , because W

d cannot be empty. We index them as W`, with ` ∈ {1, 2,..., 2 − 1}, elements of the

power set, W` ∈ P({1, 2, . . . , d}). Sometimes we refer to these sets with incidence

th vectors where the i component VW` i = 1 ⇐⇒ xi > 0 and VW` i = 0 ⇐⇒ xi = 0. 32

3.2 Logistic Normal Mixture Model

We define a logistic normal mixture model

2d−1 X g(x; µ, Ω) = P (W ) (x; µ , Ω ), (3.1) ` L W` W` `=1

where the mixing parameters, P (W`), denote the probability of occurrence of W`.

d P2 −1 P (W ) = 1. The distributions (x; µ , Ω ) are derived from a common `=1 ` L W` W` logistic normal distribution, L (x; µ, Ω), where µ ∈ Rd is a d-part location vector, and Ω ∈ Rd×d is a positive definite dispersion matrix. To ease the discussion we will refer to µ and Ω as mean vector and variance-covariance matrix respectively, although they are not moments of the distribution. For the distributions derived from the logistic normal parent distribution, the parameters µ and Ω are defined in W` W`

terms of the parameters µ, Ω, W`, and a selection matrix, BW` , which we now define.

Let W` ⊂ {1, 2, 3, . . . , d} be a nonempty set of indices. Without loss of generality we can order the indices from least to greatest

Let W` = {j1, j2, . . . , jJ } where 0 < j1 < j2 < . . . < jJ ≤ d. (3.2)

We define our J × d selection matrix, BW` = [Bi,m]. For i ∈ {1, 2,...,J}, and

m ∈ {1, 2, . . . , d}, with W` = {j1, j2, . . . , jJ }, we define the elements of [Bi,m] to

be Bi,ji = 1 and Bi,m6=ji = 0. For example, let x = (.2, 0,.3, 0,.25,.25), a 6-part

composition. The set of nonzero indices is W` = {1, 3, 5}, and the selection matrix is   1 0 0 0 0     BW = B{1,3,5} =  0 0 1 0 0  . `     0 0 0 0 1 33

We define: µ = (B )(µ). W` W`

Ω = (B )(Ω)(BT ). W` W` W`

With this structure, the mixture distribution can be more fully specified.

2d−1 X g(x; µ, Ω) = P (W ) ||W`||(x; µ , Ω ) where ` L W` W` `=1

• ||W`|| refers to the cardinality of the set W`.

P • P (W`) = 1.

• µ ∈ Rd.

µ is a subvector of µ corresponding to the W pattern of zeros. • W` `

• ΩW` is a submatrix of a d × d positive definite covariance matrix corresponding

to the W` pattern of zeros.

3.2.1 Multivariate Normal Foundation

Now we extend the notation for the inverse of the additive logratio transformation, −1 alr−1, from Aitchison(1986). We use the new symbols, alr˜ and alr˜ . We define them in terms of W and D, the maximum index.

In our approach there is a tight correspondence between the yi variables of a

multivariate normal vector and the xi parts of a composition. Equation (3.4) shows the correspondence when there are no zeros.

yi = log(xi/xD) for i = 1, 2, 3, . . . , d. (3.3) 34

T Composition: x = (x1, x2, x3, . . . , xd, xD ) | | | | (3.4)

T alr transformed vector: y = log(x−D/xD) = (y1, y2, y3, . . . , yd, )

When there is an essential zero in the composition in one of the xi parts, e.g., in x2, we use as a placeholder, as in Equation (3.5).

T Composition: x = (x1, 0, x3, . . . , xd, xD ) | | | (3.5)

˜ T alr transformed subvector: y = log(x−{2,D}/xD) = (y1, , y3, . . . , yd, )

When we have an essential zero in the ith part of the composition, we have a selected

subvector of the y’s which does not contain an element corresponding to yi. The

requirement that xD > 0 is what allows us to maintain this strict correspondence

between xi and yi. ˜ −1 T Now we define alr . For y = (yi1 , yi2 , . . . , yir ) , for j ∈ {1, 2, . . . , d, D}, and

W = {i1, i2, . . . , ir} ⊂ {1, 2, . . . , d}, we define:

˜ −1 T alr (y, W, D) = (x1, . . . , xd, xD) where, (3.6)   exp(yj )  exp(y )+exp(y )+...+exp(y )+1 if j ∈ W  i1 i2 ir  xj = 0 if j∈ / W & j ∈ {1, . . . , d}    1  if j = D. exp(yi1 )+exp(yi2 )+...+exp(yir )+1

Next we turn to defining an extension to the logistic normal distribution. Definition:

T Let x = (x1, x2, x3, . . . , xd, xD) be a composition with xD > 0.

Let W` = {i1, i2, . . . , ir} ⊂ {1, 2, . . . , d} be a nonempty set of indices of nonzero components of x. 35

Let BW` be the corresponding selection matrix. T ˜ Let y = log(BW` x−D/xD) = (yi1 , yi2 , . . . , yir ) = alr(x,W`,D) be the logratios of the nonzero components of x. If for every set W` of indices of nonzero components of x, we have y∼ (µ , Ω ), then x has a logistic normal distribution with essential N W` W` zeros, written x∼L (µ, Ω), with probability density function

2d−1 X g(x; µ, Ω) = P (W ) ||W`||(x; µ , Ω ). ` L W` W` `=1 where

X P (W`) = 1.

µ is a d-part vector in Rd.

µ is a subvector of µ corresponding to the W pattern of zeros. W` ` Ω is a d × d positive definite covariance matrix.

ΩW` is a square submatrix of Ω, corresponding to the W` pattern of zeros.

T For the case where W = {1, 2, . . . , d} the composition x = (x1, x2, . . . , xD) has the additive logistic normal distribution, L d(µ, Ω).

3.2.2 Common Expectations and Variances

−1 The definition of alr˜ enables compositions from different subdistributions to be used to estimate parameters of their shared parent distribution. Let x1 = (x11, x21,...,

T T xD1) , and let x2 = (x12, x22, . . . , xD2) with

||W || ∼ 1 x1 L (µW1 , ΩW1 ), and (3.7)

||W || ∼ 2 x2 L (µW2 , ΩW2 ). (3.8) 36

The two sets of nonzero indices, W1,W2 need not have any elements in common, nor

do they need to have the same number of elements, though x1 and x2 both have D

elements. Suppose they have an index, m, in common: m ∈ W1 ∩ W2. By properties of the logistic normal distribution (Aitchison(1986), p. 116), and the definition of −1 alr˜ in Equation 3.6 we have:

E log(xm1/xD1) = Eym = µm = Eym = E log(xm2/xD2). (3.9)

And similarly,

2 Var[log(xm1/xD1)] = Var[ym] = σm = Var[ym] = Var[log(xm2/xD2)]. (3.10)

Thus, compositions from different subdistributions of the same logistic normal distri- bution can be used to estimate the parameters of their shared parent distribution.

3.3 Data Blocks

Now that we have a correspondence between multivariate normal variables and com- positions with zeros, we could derive a density function using the standard formula for transformed variables, analogous to Aitchison(1986), chapter 6. However, it is more convenient to work in the space of the transformed variables (multivariate normal projections). Here we apply the techniques and notation of block matrices and matrix calculus to do some preparation in order to build a likelihood and attack the problem of finding estimators for the parameters. We discuss two sets of estimators, a general maximum likelihood estimator, and a simpler pair of estimators reminiscent of method of moments estimators. 37

3.3.1 Block Matrices of Compositions

We write a collection of compositional data with zeros, X, as a column of blocks

of compositions where each block, X`, has a particular pattern of zeros throughout.

th That is, for a particular block, X`, and i ∈ {1, 2, . . . , d}, the i column of X` is either all positive, or all zero.

  X1     X2   Let X =  .  (3.11) n×D  .      Xb

The dimensions of the blocks are: X1 , X2 ,..., Xb and the sum of their vertical r1×D r2×D rb×D dimensions is r1 + r2 + ... + rb = n, where n is the number of data points. We use ` to indicate a block, and t to indicate a composition (row) in that block. Next we define the patterns of zeros in each block. Here i ∈ {1, 2,...,D}. For ` ∈

{1, 2, . . . , b}, let W` ⊂ {1, 2, . . . , d} be the set of indices of strictly positive components

of X`.

  xti > 0 if i = D,   For ` ∈ {1, 2, . . . , b}, X` = [xti] where xti > 0 if i ∈ W`, (3.12)    xti = 0 if i∈ / W` and i 6= D.

3.3.2 Transformations - Ratios and Logratios

We have already defined the alr transformation for the case where there are no zeros in (2.6). Next we extend alr to alr˜ for a block matrix of compositions, X which may r`×D contain zeros. We do this by defining a selection matrix BW` corresponding to set

W`. We still have W` ⊂ {1, 2, 3, . . . , d} being a nonempty set of indices of the nonzero 38 components of x, and without loss of generality we can order the indices from least to greatest:

W` = {j1, j2, . . . , jJ } where 0 < j1 < j2 < . . . < jJ < D. (3.13)

Now we define our (J + 1) × D selection matrix, BW` = [Bp,m]. We use J + 1 here because we construct the selection matrix so that the final, Dth, component of the data is always selected. This is slightly different than before. Previously we constructed B to conform to the parameters µ(d × 1) and Ω(d × d).

For p ∈ {1, 2,...,J + 1}, and m ∈ {1, 2,...,D}, with W` = {j1, j2, . . . , jJ }, (3.14)

we define the elements of [Bp,m] to be Bp,jp = 1 and Bp,m6=jp = 0. (3.15)

X BT is a matrix where each row vector is a composition without zeros. ` W`

    T x1 x11 x12 . . . x1(J+1)      xT   x x . . . x  T  2  T  21 22 2(J+1)  X`B =   B =   . (3.16) W`  .  W`  . . .   .   . . .          x T xr 1 xr 2 . . . xr (J+1) r` ` ` `

    T T alr(x1 ) y1         alr(xT ) yT  ˜ T 2 2 We define alr(X`,W`,D) = alr(X`B ) =   =  . (3.17) W`  .   .   .   .          alr(xT ) yT r` r`

r`×(J+1) r`×J 39

˜ Let Y` = alr(X`,W`,D). (3.18) r`×J

Each row vector in Y` is a vector of reals, all potentially from the multivariate normal distribution corresponding to the `th pattern of zeros. Note that we cannot form a single block matrix, Y, from the collection of Y` because they can have different numbers of columns.

3.3.3 Means

The matrix Y` contains rows of compositions with the same pattern of zeros. We

th T refer to the t row vector of Y` as y`t. We refer to the mean as the vector y¯`, and define it as:

1 y¯ = (1T Y )T . (3.19) ` r` ` J×1 r`

Here we are using 1r` to represent an r` × 1 column vector of ones. y¯` is a column vector. This definition eases its use in quadratic forms from the multivariate normal density.

3.4 Simple Estimators

3.4.1 Illustration - Spices, Lentils, and Rice

Here we define a simple estimator for the mean parameter, and for illustration, we will be referring to the following example with artificial data. Suppose we have com- positional data on how much money Bill spends on rice, lentils, and spices when he buys food. Suppose he buys in bulk, and occasionally the store is out of either the spices or lentils, but they always have plenty of rice. Table 3.1 shows a set of such compositions where some of the entries, for spices or lentils, are zero. There are three 40

spices lentils rice 1 0.16 0.00 0.84 2 0.17 0.00 0.83 3 0.16 0.00 0.84 4 0.00 0.37 0.63 5 0.00 0.37 0.63 6 0.00 0.37 0.63 7 0.12 0.33 0.55 8 0.11 0.34 0.56 9 0.12 0.32 0.56 10 0.10 0.34 0.56 11 0.10 0.33 0.57 12 0.11 0.33 0.55

Table 3.1: Expenditures on Spices, Lentils, and Rice patterns of zeros. Tables 3.2-3.4 show the result of applying the alr˜ transformation.

X1 corresponds to rows 1-3 and its set of indices is W1 = {1}.

X2 corresponds to rows 4-6 and its set of indices is W2 = {2}.

X3 corresponds to rows 7-12 and its set of indices is W3 = {1, 2}.

log(spices/rice) log(lentils/rice) 1 -1.66 4 -0.52 2 -1.59 5 -0.52 3 -1.68 6 -0.51 ˜ ˜ Table 3.2: Y1 = alr(X1, {1}, 3) Table 3.3: Y2 = alr(X2, {2}, 3)

log(spices/rice) log(lentils/rice) 7 -1.56 -0.52 8 -1.64 -0.51 9 -1.52 -0.57 10 -1.72 -0.51 11 -1.77 -0.53 12 -1.58 -0.51 ˜ Table 3.4: Y3 = alr(X3, {1, 2}, 3) 41

3.4.2 Mean

th If X = [xti] is a collection of n compositional data points with zeros, and the D n×D component always strictly positive, we can define a simple estimator of the mean,

∗ ∗ ∗ ∗ T th µ = (µ1, µ2,..., µd) . Let ni be the number of elements of the i column of X that are nonzero. For i ∈ {1, 2, . . . , d}, and t ∈ {1, 2, . . . , n}, define

∗ 1 X µi = log(xti/xtD). (3.20) ni {t:xti6=0}

∗ By the assumption of normality of the logratios, the estimator µ is unbiased. ∗ T In the spices-lentils-rice example, µ = (log(spices/rice): -1.635 , log(lentils/rice): -0.523 ). For ease of interpretation, we convert the estimate back to a composition with the alr−1 transformation: (spices: 0.109 , lentils: 0.332, rice: 0.559 ). That is, our estimate of Bill’s mean expenditure is 10.9% on spices , 33.2% on lentils, and 55.9% on rice.

3.4.3 Variance

Here we show how to find estimates for variances and covariances of our mixture model using maximum likelihood estimators for normal random variables. For a single random composition, x, with components x1, x2, . . . , xD, we substitute log(xi/xD) into ∗ 2 the MLE for variances of normal random variables. We use σii for the estimator of the variances of the logratios log(xi/xD), for i ∈ {1, 2, . . . , d}, and t ∈ {1, 2, . . . , ni}.

∗ 2 1 X  ∗ 2 σii = log(xti/xtD) − µi . (3.21) ni {t:xti6=0}

If we want an unbiased estimator, we can divide by (ni − 1) instead of ni. As with ∗ means, the different σii are based on different numbers of observations, ni. 42

3.4.4 Covariance

It only makes sense to talk about estimating the covariance of the variables log(xi/xD)

and log(xj/xD) when both xti and xtj are not 0 so we define nij = ||{t : xti 6= 0 & xtj 6=

0}||. That is, nij is the number of data points where both xti and xtj are not 0. As we did with variance, we can start with the canonical maximum likelihood formula for estimating covariance among normally distributed variables, and substitute in appropriate logratios.

∗ 1 X ∗ ∗ σij = (log(xti/xtD) − µi)(log(xtj/xtD) − µj) (3.22) nij {t:xti6=0 & xtj 6=0}

∗ ∗ ∗ Note that σij is based on nij observations, while µi and µj are based on ni and nj observations, respectively. The formula in Equation 3.22 is based on the maximum likelihood estimator for covariance of normal variables. For unbiased estimators we

would divide by (nij − 1) instead of nij. ∗ ∗ Our estimator for the d×d variance-covariance matrix is Ω = [σij]. There are two potential problems with this approach. There could be i, j, i 6= j, such that whenever

xi > 0, xj = 0. In that case we cannot estimate the covariance. Also, irrespective of ∗ that, the estimate of the variance-covariance matrix, Ω, might not be positive definite. In the spices-lentils-rice example,   ∗ 0.00648 −0.00096   Ω =   , −0.00096 0.00035

which is positive definite. The covariance term is negative but quite small. When Bill’s expenses on lentils increase, his expenses on spices tend to decrease. 43

3.5 Maximum Likelihood Estimators

∗ For the case where there are no zeros, µ is a maximum likelihood estimator (MLE), ∗ but in general it is not. From now on we will call µ the simple estimator, to contrast it with the MLE, which we derive next. We start by finding the location MLE given Ω for 3-part compositions, show it is unbiased, and then show the relative efficiency of the simple estimator with respect to the MLE. Assume we have a set of logistic normal compositional data with b different patterns of zeros as in (5.4), spelled out in more detail in Equation (3.23).

i.i.d. ||W || T ∼ 1 x11,..., x1r1 L (BW1 µ, BW1 ΩBW1 ) (rows of X1)

i.i.d. ||W || T ∼ 2 x21,..., x2r2 L (BW2 µ, BW2 ΩBW2 ) (rows of X2) . .

i.i.d. ||W || T x ,..., x ∼ b (B µ, B ΩB ). (rows of X ) (3.23) b1 brb L Wb Wb Wb b

X` = [x`ti], where

th ` ∈ {1, 2, . . . , b}; Identifies ` pattern of zeros; x`ti 6= 0 if i ∈ W`

th t ∈ {1, 2, . . . , r`}; Identifies t observation in group `

i ∈ {1, 2,...,D}; Indexes the component of the composition

th We use x`t to refer to the t compositional observation with W` pattern of zeros. We ˜ define y`t = alr(x`t,W`,D), and to ease notation, we write in terms of y`t. 44

3.5.1 Likelihood

First we write the likelihood and log likelihood for D-part compositions, conditional on

P (W`), and then restrict ourselves to 3-part compositions. The likelihood conditional on P (W`) is:

L(µ, Ω|P, r1, . . . , rb, y11,..., ybrb ) = (3.24)

b r` Q −1 Y Y ( i∈W x`ti )P (W`) 1 ` exp  − (y − B µ)T (B ΩBT )−1(y − B µ). ||W ||/2 T 1/2 `t W` W` W` `t W` (2π) ` |BW ΩB | 2 `=1 t=1 ` W`

The constant

b r` Q −1 Y Y ( i∈W x`ti )P (W`) ` (3.25) ||W ||/2 T 1/2 (2π) ` |BW ΩB | `=1 t=1 ` W` is independent of µ, so for purposes of maximizing the likelihood with respect to µ, we can treat it as a single constant, C.

L(µ, Ω|r1, . . . , rb, y11,..., ybrb )

b r` Y Y  1 T T −1  = C exp − (y`t − BW µ) (BW ΩB ) (y`t − BW µ) (3.26) 2 ` ` W` ` `=1 t=1

b r`  1 X X T T −1  = C exp − (y`t − BW µ) (BW ΩB ) (y`t − BW µ) . (3.27) 2 ` ` W` ` `=1 t=1

Taking the log gives:

log L(µ, Ω|r1, . . . , rb, y11,..., ybrb )

b r` 1 X X T T −1 = log C − (y`t − BW µ) (BW ΩB ) (y`t − BW µ). (3.28) 2 ` ` W` ` `=1 t=1

For the simple case of three-part compositional data with some zeros in component one, and some zeros in component two, the parent distribution is three-part logistic 45

normal with parameters µ and Ω . 2×1 2×2

    µ s s  1   11 12 N (µ, Ω) where µ =   , and Ω =   (3.29) µ2 s12 s22

For the inverse of Ω we have,

  1 s22 −s12 A = Ω−1 =   . (3.30) 2   s11s22 − s12 −s12 s11

For the two univariate patterns of zeros, the inverses of the variances are: 1 and 1 . s11 s22 In these formulas,

th y1j1 is the j data point among the univariate data from the first component.

th y2j2 is the j data point among the univariate data from the second component.

y3j is a 2-part vector with data from both components,

      y y  1j1     3j1        . (3.31) y2j2 y3j2

In the example, the y1j1 correspond to elements of Table 3.2, log(spices/rice). The y2j2

correspond to elements of Table 3.3, log(lentils/rice), and y3j1 correspond to elements of Table 3.4, both log(spices/rice) and log(lentils/rice). We define the means of these matrices in the usual way.

  r1 r2 r3 1 X 1 X 1 X y¯31 y1j1 =y ¯11 y2j2 =y ¯22 y3j =   r1 r2 r3   j=1 j=1 j=1 y¯32

(3.32) 46

3.5.2 Partial Derivatives

∂ log L(µ|Y, Ω, r , r , r ) 1 2 3 = ∂µ1 1 s22 −s12 r1(¯y11 − µ1) + 2 r3(¯y31 − µ1) + 2 r3(¯y32 − µ2) (3.33) s11 s11s22 − s12 s11s22 − s12

∂ log L(µ|Y, Ω, r , r , r ) 1 2 3 = ∂µ2 1 s11 −s12 r2(¯y22 − µ2) + 2 r3(¯y32 − µ2) + 2 r3(¯y31 − µ1) (3.34) s22 s11s22 − s12 s11s22 − s12

3.5.3 MLE for Location, Given Ω

We set the partial derivatives equal to zero, replace µ withµ ˆ, and solve. The result is:

2 (r1y¯11 + r3y¯31)(r2 + r3)s11s12 − r1y¯11r2s12 + (¯y22 − y¯32)r2r3s11s12 µˆ1|Ω, r1, r2, r3 = 2 (r1 + r3)(r2 + r3)s11s22 − r1r2s12 (3.35)

2 (r2y¯22 + r3y¯32)(r1 + r3)s11s12 − r2y¯22r1s12 + (¯y11 − y¯31)r1r3s12s22 µˆ2|Ω, r1, r2, r3 = 2 (r1 + r3)(r2 + r3)s11s22 − r1r2s12 (3.36)

In the case where there are no univariate data from the second component, i.e,

r2 = 0, we have:

" r3 r1 # r1y¯11 + r3y¯31 1 X X ∗ (ˆµ1|Ω, r1, r2, r3) = = y3j1 + y1j1 =y ¯11 = µ1. r =0 (r + r ) (r + r ) 2 3 1 3 1 j=1 j=1 (3.37) 47

∗ Similarly by symmetry, (ˆµ2|Ω, r1, r2, r3)|r1=0 = µ2. When r3 = 0 we have,

∗ (ˆµ1|Ω, r1, r2, r3) =y ¯11 = µ1, (3.38) r3=0 ∗ (ˆµ2|Ω, r1, r2, r3) =y ¯22 = µ2. (3.39) r3=0

3.5.4 Unbiasedness of Conditional MLE for 3-Part Composition

To show that µˆ|Ω, r1, r2, r3 is unbiased, we start by pointing out the expectations of the various means:

" r # r 1 X1 1 X1 E[¯y ] = E y = E [y ] = µ (3.40) 11 r 1j1 r 1j1 1 1 j=1 1 j=1 " r # r 1 X2 1 X2 E[¯y ] = E y = E [y ] = µ (3.41) 22 r 2j2 r 2j2 2 2 j=1 2 j=1     " r3 # r3 y¯31 1 X 1 X µ1 E   = E y3j = E [y3j] =   (3.42)   r3 r3   y¯32 j=1 j=1 µ2

When we take the expectation in expression(3.35), the term with (¯y22 − y¯32) vanishes

because E[¯y22] = E[¯y32]. That leaves only terms with E[¯y11] = µ1 and E[¯y31] = µ1, which we can factor:

2 µ1 [(r1 + r3)(r2 + r3)s11s12 − r1r2s12] E [ˆµ1|Ω, r1, r2, r2] = 2 = µ1 (3.43) (r1 + r3)(r2 + r3)s11s22 − r1r2s12

This shows thatµ ˆ1 is unbiased. By symmetry we get thatµ ˆ2 is unbiased.

3.5.5 General Maximum Likelihood Estimators

For the general case of MLE for higher dimensions than shown here, the log likelihood can be differentiated, and the score functions can be solved with a computer algebra 48 system. In addition, the Hessian can be checked to verify the solution is a maximum. We have done this for the case of 3-part compositions but not for the general case of D-dimensions.

3.6 Variances of Location Estimators

Next we find variances of the two location estimators, the MLE, and the simple estimator. Both are unbiased. A question we need to answer is, what is the efficiency of the simple estimator relative to the MLE. We have been using µˆ for the MLE. We ∗ continue to use µ for the simple estimator (of the location). In our discussion,

∗ Var(ˆµ1) efficiency(µ1, µˆ1) = ∗ . (3.44) Var(µ1)

3.6.1 Variances of Location estimators

The variances of the MLE and the simple location estimator are derived in the Ap- pendix. They are:

Var(ˆµ1|Ω, r1, r2, r3) = 2 3 2 2 2 2 2 2  2 2 2 2 2 r3 (r3 + r2) s11s22 + r2s11s12s22 − 2r2(r3 + r2)s11s12s22 + r2r3s11s12s22 + r1s11((r3 + r2)s11s22 − r2s12) 2 2 2 ((r3 + (r2 + r1)r3 + r1r2)s11s22 − r1r2s12) (3.45)

Var(ˆµ2|Ω, r1, r2, r3) = 2 3 2 2 2 2 2 2  2 2 2 2 2 r3 (r3 + r1) s22s11 + r1s22s12s11 − 2r1(r3 + r1)s22s12s11 + r1r3s22s12s11 + r2s22((r3 + r1)s22s11 − r1s12) 2 2 2 ((r3 + (r2 + r1)r3 + r1r2)s11s22 − r1r2s12) (3.46)

∗ s11 Var(µ1|Ω, r1, r2, r3) = . (3.47) r1 + r3 49

∗ s22 Var(µ2|Ω, r1, r2, r3) = . (3.48) r2 + r3

3.6.2 Relative Efficiency of Location Estimators

The first thing we show is that when the covariance element of Ω is zero, i.e, s12 = 0, ∗ then Var(ˆµ) = Var(µ).

Var(ˆµ1|Ω, r1, r2, r3) = 2 3 2 2 2 2 2 2  r3 (r3 + r2) s11s22 + r2s11s12s22 − 2r2(r3 + r2)s11s12s22 2 2 2 (3.49) ((r3 + (r2 + r1)r3 + r1r2)s11s22 − r1r2s12) 2 2 2 2 2 r2r3s11s12s22 + r1s11((r3 + r2)s11s22 − r2s12) + 2 2 2 . (3.50) ((r3 + (r2 + r1)r3 + r1r2)s11s22 − r1r2s12)

Evaluate at s12 = 0.

r ((r + r )2s3 s2 ) + r s ((r + r )s s )2 Var(ˆµ |Ω, r , r , r ) = 3 3 2 11 22 1 11 3 2 11 22 (3.51) 1 1 2 3 2 2 s12=0 ((r3 + (r2 + r1)r3 + r1r2)s11s22)

Factor numerator and denominator.

2 3 2 (r + r ) s s (r + r1) s ∗ Var(ˆµ |Ω, r , r , r ) = 3 2 11 22 3 = 11 = Var(µ ). (3.52) 1 1 2 3 2 2 2 2 1 s12=0 (r3 + r1) (r3 + r2) s11s22 r3 + r1

Similarly,

s ∗ 22 Var(ˆµ2|Ω, r1, r2, r3) = = Var(µ2). (3.53) s12=0 r3 + r2

∗ We have already shown in Section 3.5.3 that when r2 = 0, µˆ1 = µ1; when r1 = ∗ ∗ ∗ 0, µˆ2 = µ2; and when r3 = 0,µ ˆ1 = µ1, andµ ˆ2 = µ2. Next we need to compare the ∗ variance of µ with the variance ofµ ˆ in cases where the estimators are not obviously the same. We consider a sample of 100 compositions from a logistic normal distribution with the number of zeros in part 1 ranging from 0 to 100, and similarly for part 2. 50

We calculate the relative efficiency. These are not simulations; they are calculations based on the expressions for the variances of the estimators. We consider all possible

combinations of r1, r2, r3 such that r1 + r2 + r3 = 100. A larger sample would give roughly the same picture, just with finer granularity. In addition, while we want to understand the effect of the covariance term s12 for every possible value between −1 and 1, we get a feel for the space by choosing three values, s12 ∈ {0, 0.2, 0.8}. For

simplicity we choose s11 = s22 = 1. ∗ ∗ In all three figures, we plot Var(ˆµ2)/Var(µ2) versus Var(ˆµ1)/Var(µ1). In Figure

3.1 we use a small covariance term, s12 = 0.2. In Figure 3.2 we use a large covariance,

s12 = 0.8. In both figures, we shade by the size of r1 relative to r2. We already showed ∗ in (3.52) and (3.53) that when s12 = 0, the relative efficiency of µ with respect to µˆ

is 1, so there is no plot for s12 = 0. 51

Relative Efficiency: Covariance=0.2 1.00

0.99 r1.minus.r2 ) 2 ) e 2 l 50 e p l m m i

( 0 s (

Var 0.98 −50 Var

0.97

0.97 0.98 0.99 1.00 Var(mle1) Var(simple1)

∗ Figure 3.1: Efficiency of µ Relative toµ ˆ with Low Covariance (0.2)

∗ ∗ Figure 3.1 shows the relationship between efficiency of µ1 and µ2 and the relative ∗ sizes of r1 and r2. In the worst case, when r1 >> r2, the efficiency of µ1 approaches ∗ 1, and the efficiency of µ2 falls off toward 0.97. A point to note here is that for a ∗ relatively small covariance, 0.2, the simple estimator, µ has a variance almost as small as that ofµ ˆ. We will save discussion of the bands or striations for Figure 3.3. 52

Relative Efficiency: Covariance=0.8 1.0

r1.minus.r2 0.8 ) 2 ) e 2 l 50 e p l m m i

( 0 s (

Var −50 Var

0.6

0.6 0.8 1.0 Var(mle1) Var(simple1)

∗ Figure 3.2: Efficiency of µ Relative toµ ˆ with High Covariance (0.8)

Figure 3.2, which shows efficiency based on a covariance of 0.8, has the same ∗ pattern as Figure 3.1, but with larger variances for µ, smaller efficiency. Here the ∗ worst cases can have an efficiency of less than 0.5 for either component of µ, though ∗ ∗ when the efficiency of µ1 is that small, the efficiency of µ2 is very near 1. 53

Relative Efficiency: Covariance=0.8 1.0

r3 0.8 ) 2 ) e 2

l 75 e p l m m i

( 50 s (

Var 25 Var 0 0.6

0.6 0.8 1.0 Var(mle1) Var(simple1)

∗ Figure 3.3: Efficiency of µ Relative toµ ˆ with High Covariance (0.8) and Relative to r3

Figure 3.3 shows the same points, for a covariance of 0.8, but shaded by the value of r3. To help decipher it, we show a subset of the points in Figure 3.4. 54

Relative Efficiency: Covariance=0.8 1.0

r3 0.8 60 ) 2 ) e 2 l e p l 40 m m i ( s (

Var 20 Var

0.6

0.6 0.8 1.0 Var(mle1) Var(simple1)

∗ Figure 3.4: Efficiency of µ Relative toµ ˆ with High Covariance (0.8). r3 ∈ {1, 2, 3, 4, 61, 62, 63, 64}.

Figure 3.4 shows a subset of the points, only the points where r3 ∈ {1, 2, 3, 4, 61, 62,

63, 64}. When r3 is very small there is a wide range of possibilities for r1 and r2. The four leftmost points in the upper left of Figure 3.4 are points where r1 is 1 or 2; r2 is somewhere between 94 and 97, and r3 is 2, 3, or 4. In these cases, the sample for estimating µ1 is very small, from 3 to 6 points, some from univariate data and some from the bivariate data. In that case, the MLE has a much smaller variance than the simple estimator. There is a much larger sample from univariate data for estimating

µ2, upwards of 90 points, plus a handful of points from the bivariate data, so the ∗ difference between the variance of µ2 andµ ˆ2 is very small. Graphs with negative covariances, −0.2, and −0.8 look the same as with positive 55 covariances, and are omitted for the sake of brevity.

3.6.3 Summary of Relative Efficiency

∗ ∗ Both µ and µˆ are unbiased given Ω. The efficiency of µ relative to µˆ tends to decrease as the covariance component of Ω increases. We say “tends” because even with a covariance of 0.8, there are cases where the efficiency of both components of ∗ µ relative to µˆ is very close to one. ∗ When there are relatively few zeros, and they are balanced, µ has a variance almost as small as µˆ. The more zeros there are, or the more unbalanced their distribution is, the larger the variance of one or more components of the simple estimator. 56

Chapter 4

Subcompositional Coherence

One of the motivations for using the logistic normal distribution, mentioned repeat- edly in the literature, is a property called subcompositional coherence. Although there are multiple definitions, none are very precise. The spirit of the idea is this, from Aitchison and Egozcue(2005),

“Subcompositional coherence demands that two scientists, one using full compositions and the other using subcompositions of these full composi- tions, should make the same inference about relations within the common parts.” (4.1)

Aitchison writes about the importance of the property in Aitchison(1994), Aitchison (1999), and Aitchison et al. (2002), and Aitchison and Egozcue(2005). Others have argued for it as well, Egozcue(2009), Egozcue and Pawlowsky-Glahn(2011), and oth- ers. There is controversy though. Butler and Glasbey(2008) claimed, “Any approach that attempts to describe zero and non-zero proportions by using a common model will inevitably break these principles [scale invariance and subcompositional coher- ence], because ratios of proportions are infinite along the boundaries of the simplex.” Scealy and Welsh(2014), also object: “We show that this Principle [subcomposi- tional coherence] is based on implicit assumptions and beliefs that do not always hold. Moreover, it is applied selectively because it is not actually satisfied by the log-ratio methods it is intended to justify.” In this chapter we posit that part of the reason for the controversy is the lack of a precise definition, and we propose one. We take the statement in (4.1) to mean that 57

when estimating the mean and covariance of a subcomposition of a set of data, the answer should be the same whether one computes the mean on the subcomposition, or takes the subcomposition of the full mean. We state this more formally in section (4.4), but first we illustrate the idea with some examples from our spice-lentil-rice data. We define a naive mean estimator and show it does not have this property, and ∗ we show that µ does. Then we give examples of covariance estimators which do and do not have this property. We have two goals for this chapter. One is to state this important property much more precisely. The second is to show that it is not just a property of distributions, but of estimators.

4.1 Examples

For the examples in the next two subsections, we refer to rows 1-8 of our spice-lentil- rice data set, Table 4.1.

spices lentils rice 1 0.16 0.00 0.84 2 0.17 0.00 0.83 3 0.16 0.00 0.84 4 0.00 0.37 0.63 5 0.00 0.37 0.63 6 0.00 0.37 0.63 7 0.12 0.33 0.55 8 0.11 0.34 0.56

Table 4.1: Some artificial data

4.1.1 Naive Estimator, ˚µ

This section illustrates an estimator which produces different results depending on order of operations, whether the subcomposition is done before or after calculation of 58

the mean. We start by defining a naive estimator ˚µ = [µi], i ∈ {1, 2,...,D}.

n 1 X ˚µ = x . (4.2) i n ti t=1

First we compute ˚µ for the full set: (spices: 0.089, lentils: 0.223, rice: 0.688). Take the subcomposition {1,3}: (spices: 0.089, rice: 0.688). And renormalize: (spices: 0.114, rice: 0.886). When we start by taking the subcomposition {1,3} first, and then calculating the mean, the result changes. The subcomposition {1,3} is shown in Tables (4.2) and

(4.3). Computing ˚µ{1,3} from the values in Table (4.2) the result is (spices: 0.103, spices rice spices rice 1 0.16 0.84 1 0.16 0.84 2 0.17 0.83 2 0.17 0.83 3 0.16 0.84 3 0.16 0.84 4 0.00 1.00 4 0.00 0.63 5 0.00 1.00 5 0.00 0.63 6 0.00 1.00 6 0.00 0.63 7 0.17 0.83 7 0.12 0.55 8 0.16 0.84 8 0.11 0.56 Table 4.3: Subcomposition (renormal- Table 4.2: Raw Subset ized)

rice: 0.897). This value is different from the one above. For this mean estimator, the values are not invariant to the order of operation. spices rice mean first, subcomposition second 0.114 0.886 subcomposition first, mean second 0.103 0.897

Table 4.4: Result is not order invariant for ˚µ

∗ 4.1.2 ALR˜ Estimator, µ

∗ Now we do a similar pair of calculations with µ from Equation (3.20) in section ∗ (3.4.2), again using the data in Table 4.1. When we calculate µ first, before taking a 59

subcomposition, we get: (log(spices/rice): -1.626, log(lentils/rice): -0.517). The sub- vector without lentils is (log(spices/rice): -1.626). Translating back to a composition −1 with alr˜ gives (spices: 0.164, rice: 0.836). ∗ If we take a subcomposition first, as in Tables 4.2 and 4.3, and then compute µ, −1 we get (log(spices/rice): -1.626). Translating back to a composition with alr˜ gives (spices: 0.164, rice: 0.836), which is the same as in the previous paragraph. The point ∗ is that µ is invariant to the order of operations.

4.2 Covariance Estimators

Covariance estimators can also be, but need not be, invariant to the order of opera-

tions. Aitchison(1986) (pp. 52-55) defines a “crude covariance matrix,” K = [κij : D×D

i, j = 1,...,D]. κij = Cov(xi, xj), (i, j = 1,...,D; r = 1,...,N; n = N − 1). He defines the estimator in the way that is common for multivariate noncompositional data.

N N ˆ −1 X −1 X K = [ˆκij];κ ˆij = n (xri − x¯i)(xrj − x¯j), (i, j = 1,...,D);x ¯i = N xri. r=1 r=1 (4.3)

He shows that the order of operations matters. Taking a subcomposition of the data and then estimating Kˆ gives a different result than estimating Kˆ from the full data, and then projecting onto the subspace determined by the subcomposition. We make this explicit in section 4.4. Specifically, Aitchison(1986) showed that Condition (4.14) (which we get to shortly) does not hold for Kˆ . ∗ In contrast, the ALR˜ Estimator, Ω, was defined to have the property that esti- mated covariance matrices derived from subcompositions are equal to the submatrix ∗ of Ω estimated from the full data set. 60

4.3 More From the Literature

Two other definitions of subcompositional coherence from the literature are given below. After listing them, we offer our own, which we think more precisely captures the idea above in (4.1). Egozcue(2009) produced a more detailed characterization. He listed four princi- ples that are meant to give insight into how compositional data should be thought of, and one of them is compositional coherence. Here we give only the two principles relevant to subcompositional coherence. Egozcue(2009), p. 830, wrote,

“A. Scale Invariance. The information in a composition does not depend on the particular units in which the composition is expressed. Propor- tional positive vectors represent the same composition. Any sensible char- acteristic of a composition should be invariant under a change of scale. C. Subcompositional Coherence. Information conveyed by a composition of D parts should not be in contradiction with that coming from a sub- composition containing d parts, d ≤ D. This generic principle can be formulated more precisely as

Sub-compositional Dominance: If ∆n(·, ·) is any distance between compo-

sitions of n parts then ∆D(x, y) ≥ ∆d(xd, yd), where x, y are compositions

of D parts and xd, yd are sub-compositions of the previous ones with d parts, d ≤ D. Ratio preserving: Any relevant characteristic expressed as a function of the parts of a composition is exclusively a function of the ratios of its parts. In a sub-composition these characteristics depend only on the ra- tios of the selected parts and not on the discarded parts of the parent composition. Principle A applies to the sub-composition.” (4.4)

More recently Egozcue and Pawlowsky-Glahn(2011), p. 16, wrote, 61

“The principle of coherence can be summarized as two criteria: (a) the principle of scale invariance should hold for any of the possible subcompo- sitions, thus implying preservation of ratios of parts; (b) if a distance or divergence is used to compare compositions, this distance or divergence should be greater than or equal to that obtained comparing the corre- sponding subcompositions (subcompositional dominance).” (4.5)

4.4 An Alternative Characterization

Both of the preceding characterizations of subcompositional coherence are somewhat vague. Here we offer an alternative definition in the spirit of Aitchison and Egozcue

T (2005), but more precise. Let xi = (xi1, xi2, . . . , xid, xiD) be a D-place composition, and let

  T x1     xT   2  X =  .  (4.6)  .      T xn be a of compositions. Let B be a selection matrix specifying a sub- k×D composition of k parts corresponding to the ordered set of indices {j1, j2, . . . , jk−1, jk}

th where jk = D. That is, the set of indices of the subcomposition includes the D part. Let B be a selection matrix specifying the components corresponding to k−1×d th {j1, j2, . . . , jk−1}, i.e., all except the D part. Let Locd(·) be an estimator of the

location parameter, and µ¨ be its estimate, and let Vard(·) be an estimator of the covariance parameter, and Ω¨ be its estimate (4.7).

n×d d×1 Locd : S → R 62

X 7→ µ¨ (4.7) n×D d×1

n×d d×d Vard : S → R

X 7→ Ω¨ (4.8) n×D d×d

We define subcompositional coherence as requiring that the conditions in Equations (4.9, 4.10) hold. Condition (4.9) says that you get the same result whether you take subcomposition of the data first, and then estimate the mean, or do things in the other order, estimating the mean first, and then taking the subcomposition. Condition (4.10) says that taking a subcomposition and then estimating the covariance gives the same result as estimating the full covariance, and then projecting onto the subspace of the subcomposition.

T Lock−1(C ( X (B ))) = B (Locd( X )) . (4.9) n×D D×k k−1×d n×D | {z } d×1 (k−1)×1 | {z } (k−1)×1

T T Vark−1(C ( X (B ))) = B(Vard( X ))B . (4.10) n×D D×k n×D (k−1)×(k−1) d×d (k−1)×(k−1)

∗ The simple mean estimator, µ, is defined in such a way (Equation (3.20)) that ∗ ∗ the estimate µi is independent of the other µj, j 6= i. Hence it satisfies Condition 4.9, whether or not there are zeros. Similarly, our simple covariance estimator, by the way it is defined in Equations (3.21) and (3.22), satisfies Condition (4.10). The situation for our maximum likelihood estimator for the mean, however, is more complicated. When there are no zeros, our mixture reduces to a single logistic normal distribution and the MLE for the mean is the same as the simple estimator, and Condition (4.9) holds. Similarly, for the covariance, when there are no zeros, the MLE for covariance 63 is the same as the simple estimator, and satisfies Condition (4.10). However, in the presence of zero components, the MLE for the mean does not satisfy condition 4.9. Recall that we showed in Section (3.5.3) in Equations (3.35, 3.36 ) that when there are zeros, the MLE of the mean is not independent of the

th covariance. Data from the i component can influence the MLE of µj (i 6= j), via covariances. If we then compute the location MLE of a subcomposition that does not

th contain the i component, we can get a different value for the estimate of µj. The conditions in Equations (4.9, 4.10) have selection matrices of two different dimensions. These are needed to be compatible with models which use one part of the composition as a reference component, like the logistic normal, or the approach by Leininger et al. (2013). For the approaches that do not use a reference component, like the Dirichlet, and the approaches by Scealy and Welsh(2014) and Butler and Glasbey(2008) the conditions would be as in Equations (4.12) and (4.14).

n×D D×1 LocD : S → R

X 7→ µ¨ (4.11) n×D D×1

T Lock(C ( X (B ))) = B (LocD( X )) . (4.12) n×D D×k k×D n×D | {z } D×1 k×1 | {z } k×1

n×D D×D VarD : S → R

X 7→ Ω¨ (4.13) n×D D×D 64

T T Vark(C ( X (B ))) = B (VarD( X ))B . (4.14) n×D D×k k×D n×D D×k | {z } | {z } k×k k×k

It is worth noting that Conditions (4.9, 4.10, 4.12, and 4.14) are defined in terms of the estimators, not the distributions. We have shown that for our logistic normal mixture distribution, there are estimators which satisfy Conditions (4.9, 4.10) and estimators which do not. Hence subcompositional coherence is not solely a property of a distribution. It is a property of estimators. In section (2.6.2) we noted that Butler and Glasbey(2008) point out that the prin- ciple of subcompositional coherence does not hold for their model. Scealy and Welsh (2014) cite a technical report by Aitchison (2003) in which he claims that approaches based on spherical coordinates do not satisfy the principle of subcompositional coher- ence. They do not rebut the claim, but rather, argue that subcompositional coherence should not be taken to be important. As for the Dirichlet, we conjecture that the MLE does not satisfy condition (4.12), but we have not shown that. There is no closed form for the MLE, so the task would involve obtaining or manufacturing a set of data and using numerical methods to find the MLE before and after taking subcompositions. We have left that for future work. 65

Chapter 5

Two-sample Permutation Test

5.1 Introduction

In this chapter we develop a permutation test for comparing means of two samples. We start with Hotelling’s T-squared test, a well known test for comparing means of two multivariate normal samples, and develop a permutation test based on the same statistic. A common model for a univariate test of a difference in means is the shift model (Romano(1990); Good(2000); Lehmann(2009)). If F and G are two continuous cumulative distribution functions, the null and alternative hypotheses are these.

H0 : G(y) = F (y). (5.1)

Ha : G(y) = F (y − ∆), ∆ > 0. (5.2)

In the extension to multivariate distributions, ∆ becomes multivariate, so we need a statistic that maps Rd → R. Hotelling’s T-squared has that property, and seems reasonable for the same reasons it is reasonable when the distributions are multivariate t, namely, we are computing a magnitude of a distance between two means, adjusting for variances and covariances. 66

5.2 A Permutation Test

Suppose we have two samples, X1, X2 of D-place compositions, xi, where X1 =

T T T T T T T T [x1 , x2 ,..., xn1 ] and X2 = [xn1+1, xn1+2,..., xn ] .

 T  x1    T   x2     .   .        X  xT  1  n1  Let X =   =   . (5.3) n×D    T  X2 xn +1  1    xT   n1+2    .   .    T xn

T The individual compositions are: xt = [xt1, xt2, . . . , xtD], for t ∈ {1, 2, . . . , n}. Let the number of compositions in X2 be n2 = n − n1. The order of elements within each composition is fixed. When we use the phrase permutation of X we mean a permutation of the rows. When we speak of exchangeability later, we refer to the relative order of two whole compositions (rows), not their components. We use X∗ to refer to a permutation of X. In any permutation, X∗, we assign the

∗ ∗ first n1 elements to group 1, written X1, and the next n2 elements to group 2, X2. That is,

  ∗ X1 X∗ =   . (5.4) n×D  ∗ X2 67

5.2.1 Test Statistic

We base our statistic on Hotelling’s T-squared (Lattin et al. (2003); Pesarin(2001)).

2 n1n2 T −1 Hotelling’s T = ∆ Sp ∆. (5.5) n1 + n2

∗ ∗T −1 ∗ Our permutation statistic is: T = ∆ Sp ∆ , (5.6)

∗ ∗ ∗ −1 where ∆ = m1 − m2 is a difference of means (of log ratios), and Sp is the inverse of the estimated pooled variance-covariance of ∆∗. These are computed in the alr trans- formed space. If there are no zeros in the data, our statistic reduces to Hotelling’s

∗ ∗ T-squared without the initial constant. m1 and m2 are our simple location estima- tors from earlier. We define the pooled covariance in terms of nonzero cases from

X1 and X2 so as to make sure it is nonnegative definite. We compute the pooled covariance once, from the original samples, and use it repeatedly, unchanged for each permutation, as described in Pesarin(2001), (p. 153).

5.2.2 Pooled Covariance, Sp

We define our pooled covariance, Sp, in terms of the covariances of the two samples, S1 and S2. For each sample we compute something analogous to the standard complete- cases estimate for the covariance, and then compute the weighted average of the two estimates. First we define submatrices of X1 and X2 which contain only compositions with no zero components. We call these submatrices X1NZ and X2NZ , respectively. We perform the logratio transformation on each submatrix, giving

T T Y1NZ = [alr(xt )] where xt ∈ X1NZ , and (5.7)

T T Y2NZ = [alr(xt )] where xt ∈ X2NZ . (5.8) 68

We compute the covariance estimates of Y1NZ and Y2NZ in the standard way, as if

each were a matrix of multivariate normal data. We use N1 for the number of rows

in Y1NZ , and N2 for the number of rows in Y2NZ . We use I for an identity matrix

of the appropriate dimension (either N1 × N1 or N2 × N2), and 1 as a vector of ones

of the appropriate dimension (either N1 × 1 or N2 × 1). S1NZ and S2NZ indicate the respective variance-covariance estimates.

1 T 1 T S1NZ = Y1NZ (I − 11 )Y1NZ , and (5.9) N1 − 1 N1 1 T 1 T S2NZ = Y2NZ (I − 11 )Y2NZ . (5.10) N2 − 1 N2

The pooled covariance estimate is the sum of the covariance estimates of the means of the samples.

1 1 Sp = S1NZ + S2NZ (5.11) N1 N2 N S + N S = 2 1NZ 1 2NZ . (5.12) N1N2

Before pooling the covariances it makes sense to check whether they are significantly different. We use Box’s M test for that (Lattin et al. (2003) p. 442).

5.2.3 Difference of Means, ∆

First we develop ∆. The definitions below stipulate that for each sample, j ∈ {1, 2},

∗ the statistic mj is a vector of component-wise means of logratios of compositions, excluding the zero terms.

∗ ∗ ∗ For any permutation, X , we write X = [xti], and we define the sample means n×D 69

∗T mj for j ∈ {1, 2}. We use i ∈ {1, 2, . . . , d} to index the individual components.

∗T ∗ ∗ ∗ mj = [mj1, mj2, . . . , mjd] (5.13)

∗ = [mji], where (5.14)

∗ 1 X ∗ ∗ m1i = log(xti/xtD); (5.15) n1i ∗ {t:t∈{1,2,...,n1}&xti6=0}

∗ 1 X ∗ ∗ m2i = log(xti/xtD); (5.16) n2i ∗ {t:t∈{n1+1,n1+2,...,n}&xti6=0} n X1 n1i = I(xti 6= 0); (5.17) t=1 n X n2i = I(xti 6= 0). (5.18) t=n1+1

∗ ∗ ∗ ∗ We define ∆ = m1 − m2. That completes the definition of our test statistic, T . Our simple point estimates for the location and dispersion are invariant to the mixing parameters, hence T ∗ is invariant to them as well. In fact, our location parameter E[log(xti/xtD)|xti 6= 0 & xD 6= 0], was defined so as to be invariant to the mixing parameters.

5.2.4 Test Definition

We assume we have two samples from logistic normal mixture distributions with the same variance-covariance parameter, and that the distribution of zeros in the two samples is such that no permutation sample, X1 or X2 ever has all zeros for some component.

We use X/X to denote the set of all permutations of X, which is the sample

1 space for our test. Our probability measure assigns a probability mass of n! to each ∗ ∗ permutation, X ∈ X/X. We define the permutation support as the set TX = {T =

∗ ∗ T (X ): X ∈ X/X}. Our statistic, T ∗, roughly Hotelling’s T-squared, is a quadratic form whose values 70 are nonnegative reals. The larger the value of T ∗, the larger the difference between means of groups 1 and 2. We use T o = T (X) to refer to the observed value of the statistic. If T o is in the 100(1 − α) percentile of the empirical distribution of T ∗, we reject the null hypothesis and conclude the means are different.

i.i.d. ∼ Let x1,..., xn1 L (µ, Ω, π) be from sample 1, and (5.19)

i.i.d. ∼ let xn1+1,..., xn L (µ + ∆, Ω, π) be from sample 2. (5.20)

Our null and alternative hypotheses are:

H0 : ∆ = 0, (5.21)

Ha : ∆ 6= 0. (5.22)

We define our test, ψ, as follows.

 ∗ o  1 if P r{T ≥ T |X/X} ≤ α, Let ψ =  0 otherwise.

∗ In terms of the n! permutations X ∈ X/X we have:

  1 P ∗ o  1 if ∗ I[T (X ) ≥ T ] ≤ α, ψ = n! X ∈X/X  0 otherwise.

The significance level of the test is

1 X ∗ o p = I[T (X ) ≥ T ]. n! ∗ X ∈X/X 71

Because of the discrete nature of the permutation sample space, it is not always possible to design a test of exactly size α, However, the test does produce exact significance levels. If the permutation sample space is too large to compute all possible values T (X∗), we can estimate the significance level by random sampling with replacement from the permutation sample space. In that case, for a sample Sk ⊂ X/X of size k,

∗ ∗ ∗ Sk = {X1, X2,..., Xk},

1 X ∗ o Definep ˆk = I[T (Xi ) ≥ T ]. k ∗ Xi ∈Sk

Clearly,p ˆk has a . We are sampling from a pool of two kinds of things: statistics which are less than T o, or greater than or equal to T o. As samples

get bigger,p ˆk approaches the true proportion.

1 X ∗ o lim pˆk = I[T (X ) ≥ T ]. k→∞ n! ∗ X ∈X/X

5.2.5 Procedure

Decide whether to do all n1+n2 distinguishable permutations, or to sample B times. n1

Procedure A - random sampling Compute T o.

−1 Compute Sp . Repeat for j=1 to B 72

∗ 1 Get a new random permutation, Xj .

∗ ∗T −1 ∗ 2 Compute and record Tj = ∆ Sp ∆ .

∗ #(Tj ≥T0) After B iterations, compute p-value = B , where # means “number of”.

Reject H0 in favor of H1 if p-value ≤ α where α is the desired size of the test, e.g., 0.05. Procedure B - complete enumeration Compute T o.

−1 Compute Sp . For each of the n  distinguishable permutations, let j index the jth permutation. n1 Repeat for all distinguishable permutations.

th ∗ 1 Get the j permutation, Xj .

∗ ∗T −1 ∗ 2 Compute and record Tj = ∆ Sp ∆ .

∗ n  #(Tj ≥T0) After all iterations, compute p-value = n . n1 (n1) Reject H0 in favor of H1 if p-value ≤ α where α is the desired size of the test, e.g., 0.05.

5.3 Comparison of Mixing Parameters

For testing mixing parameters, there are are readily available tests. The two samples yield a 2×k contingency table where k is the number of different zero patterns occur- ring in the data. If the counts are all large enough, a chi-squared test is appropriate. 73

Otherwise we can use a multivariate version of Fisher’s Exact Test, Agresti(2002) pp. 97-98. 74

Chapter 6

A Proteomic Application

6.1 Serum Amyloid A Compositions in a Diabetes Study

In this chapter we illustrate our model and permutation test by applying them to a set of proteomic data, compositions from assays of serum amyloid A (SAA) from subjects in a diabetes study. We test whether the mean composition is different for two groups, diabetic, and nondiabetic patients. In the end we find no significant difference. The point is just the illustration of the test. The recruitment and study protocols are described in Yassine et al. (2015). The mass spectrometric process for measuring different isoforms of serum amyloid A in blood plasma is described in Trenchevska et al. (2016). The measurements are com- positions, proportions of area under a mass spectrometry curve where each peak in the curve corresponds to a different isoform of the protein. For our illustration, we have partitioned the data into two sets, diabetic (n=98) and nondiabetic (n = 48). Non-diabetes is defined as fasting glucose below 100 mg/dL; diabetes is fasting glu- cose above 125 mg/dL. We excluded a third group of subjects classified as prediabetic. We limited the data to the first visit for each subject, and if there was more than one aliquot, we used the first.

6.1.1 SAA Proteomic Data

We consider a subset of the SAA data involving six isoforms. SAA 1.1.RS and SAA 1.1.R are posttranslational modifications of SAA 1.1, coded for by the SAA1 gene. They are lacking one N-terminal arginine (R) or the arginine-serine dipeptide (RS). 75

SAA 2.2.RS and SAA 2.2.R are posttranslational modifications of SAA 2.2, coded for by the SAA2 gene. They are lacking one N-terminal arginine (R) or the arginine- serine dipeptide (RS). The SAA 1.1 component is always positive, so we use it as our reference component. The first few lines of each set are shown in Tables (6.1) and (6.2). The full data sets are listed in an appendix.

Table 6.1: Snippet of Serum Amyloid A data from non-diabetic subjects

2.2.RS 2.2.R 2.2 1.1.RS 1.1.R 1.1

1 0.00000 0.00000 0.00000 0.19138 0.41765 0.39097 2 0.05060 0.05060 0.09889 0.09750 0.29552 0.40689 3 0.00000 0.00000 0.00000 0.11556 0.49448 0.38996 4 0.00000 0.16030 0.15074 0.16384 0.29476 0.23036

Table 6.2: Snippet of Serum Amyloid A data from diabetic subjects

2.2.RS 2.2.R 2.2 1.1.RS 1.1.R 1.1

1 0.00000 0.00000 0.00000 0.20456 0.40643 0.38901 2 0.00000 0.07385 0.10245 0.08359 0.29608 0.44403 3 0.05370 0.10561 0.12317 0.11286 0.24184 0.36282 4 0.08874 0.11629 0.10494 0.13493 0.30592 0.24919 76

6.2 Mixing Parameters

In the full samples, there are four different patterns of zeros. In all of the data, the value for the SAA 1.1 component is positive, and we use it as our reference. Since it is always positive, when discussing and testing for equivalence of patterns of zeros, we omit it. A point with zeros only in the first component is represented 01111; zeros only in the first and second is 00111; first, second, and third components only is 00011; no zeros is 11111. Table (6.3) shows the counts of the different patterns for the two samples.

Table 6.3: Counts of Zero Patterns in the Data

non-diabetic diabetic

00011 23 35 01111 8 16 11111 17 46 00111 0 1

First we test whether the non-diabetic and diabetic populations have the same mixing parameters. If the last row in Table 6.3 did not have such small numbers, 0 and 1, we might use a chi-squared test, but here we use a two-sided Fisher’s Exact Test (Agresti (2002), pp 97-98). The p-value is 0.445, so there is no indication of a significant difference.

6.3 Variance

Next we check the covariance matrices of the samples to make sure they are not ∗ significantly different. The simple estimator, Ω, for the diabetes sample covariance is in Table (6.4). 77

Table 6.4: Simple Covariance Estimate from Diabetes Sample

log(2.2.RS/1.1) log(2.2.R/1.1) log(2.2/1.1) log(1.1.RS/1.1) log(1.1.R/1.1) log(2.2.RS/1.1) 0.27 0.19 0.16 0.16 0.07 log(2.2.R/1.1) 0.19 0.19 0.15 0.12 0.08 log(2.2/1.1) 0.16 0.15 0.14 0.09 0.04 log(1.1.RS/1.1) 0.16 0.12 0.09 0.13 0.06 log(1.1.R/1.1) 0.07 0.08 0.04 0.06 0.07

It is positive definite. The simple estimate for the covariance from the non-diabetic subjects, however, is not positive definite, so we use the estimate from only the nonzero cases. There are 17 rows with no zeros in the normal sample, and the estimate is in Table (6.5).

Table 6.5: Simple Covariance Estimate from Non-Diabetes Sample

log(2.2.RS/1.1) log(2.2.R/1.1) log(2.2/1.1) log(1.1.RS/1.1) log(1.1.R/1.1) log(2.2.RS/1.1) 0.35 0.25 0.21 0.15 0.02 log(2.2.R/1.1) 0.25 0.23 0.17 0.13 0.05 log(2.2/1.1) 0.21 0.17 0.15 0.09 0.01 log(1.1.RS/1.1) 0.15 0.13 0.09 0.10 0.04 log(1.1.R/1.1) 0.02 0.05 0.01 0.04 0.06

Before pooling the two covariance estimates, if we use Box’s M-test to check whether the covariances are roughly the same, we find no significant difference. The p-value is 0.44.

6.4 Data Plots

The alr˜ transformed data are shown in Figure 6.1. 78

Diabetes and non-diabetes SAA Plot

diabetes normal

0 log(2.2.RS/1.1) ●

● ● ● ● −1 ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●●●●● ● ● −2 ● ●● ● ● ● ● ● ● ● ● ● ● ●

0 ● ● log(2.2.R/1.1) ● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ●●●●● ● ● ●● ●●●● ● −1 ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● −2 ● ● ●

0 ●

● log(2.2/1.1) ●● ● ●●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ●● ● −1 ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● value ●●● ● ● ● ●● ● ● ●●● ● ● ● ● ● −2

0 log(1.1.RS/1.1) ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●●●● ● ●●●● ● ● ●● ●● ●●● ●● ● ● ● ●● ● ●●● ●●● ●●● ● ● ● ● −1 ● ●●● ●●● ● ● ● ●● ● ● ● ●●●●● ● ●●●●●● ● ●● ● ● ● ● ●●● ●● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● −2 ● ●

● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ●●● ● ●●● ● ●●●● ● ●●● ● ●● ●● ●● ●● ● ● ●● ●● 0 ●●●●●●●●●●●● ● ●● ●● ●●● ●● ●●●● ● ● ● ● log(1.1.R/1.1) ●● ● ●●●● ● ●● ● ●● ●●●●●● ● ● ● ● ●●● ●●● ●●●● ● ●● ● ●● ● ● ● ● ●● ● −1

−2

Figure 6.1: Diabetes and Nondiabetes SAA data. The variables have been alr˜ trans- formed with respect to the reference component, SAA 1.1. 79

6.5 Permutation Test for Means

Finally we use the permutation test to check whether the mean vectors are the same,

146 39 i.e., ∆ = 0. For our particular example, there are 48 > 1.0 × 10 possible permu- tations to test in the exact test, so we use the sampling method of the permutation test instead. When we perform the permutation test with 10,000 samples, the time is about 0.5 seconds. The p-value is 0.19, so there is no strong reason to reject the null hypothesis of ∆ = 0. The estimated means for the two groups are shown in Table 6.6. This concludes the test.

Table 6.6: Simple mean estimates for SAA data

2.2.RS 2.2.R 2.2 1.1.RS 1.1.R 1.1

Non-diabetic mean 0.063 0.103 0.110 0.113 0.301 0.310 Diabetic mean 0.061 0.110 0.116 0.106 0.279 0.328 80

Chapter 7

Conclusion

Essential zeros in compositional data have long been problematic. Aitchison and Egozcue(2005) wrote, “One of the tantalizing remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential or structural zeros.” More recently, Mart´ın-Fern´andez et al. (2015) wrote, “At the present moment, there is not a general methodology for dealing with essential zeros.” Based on the approach of Aitchison and Kay(2003) we have developed a statistical model for such data, a mixture of logistic normals of different dimensions with parameters that are projections of a shared parameter. We have defined different location estimators and explored their properties. The MLE is difficult to find and does not preserve subcompositional coherence, but we have defined a simple location estimator that is consistent and easy to compute. We have also defined a simple dispersion estimator. Using these estimators, we developed a two-group test of means, and illustrated its use on data obtained from mass spectrometric methods in a recent diabetes study. One of the main contributions of this dissertation is a new understanding of the principle of subcompositional coherence. We developed a new, precise definition, and showed that the principle applies to estimators, not (just) distributions. There is much more that can be done in terms of future work. The two most obvi- ous extensions would be to develop a regression model, and general purpose software for finding the joint MLE for arbitrary dimensions. 81

Appendix A

Variances of Estimators

A.1 Variance of Location MLE, µˆ

Next we derive the variance of the location MLE, µˆ|Ω, r1, r2, r3. First we rewrite the expression (3.35) so that each of they ¯ terms stands alone.

µˆ1|Ω, r1, r2, r3 = 2 2 (r3 + r2r3)s11s22y¯31 − r2r3s11s12y¯32 + r2r3s11s12y¯22 + ((r1r3 + r1r2)s11s22 − r1r2s12)¯y11 2 2 (r3 + (r2 + r1)r3 + r1r2)s11s22 − r1r2s12 (A.1)

Pr1 To find the variance ofµ ˆ1|Ω, r1, r2, r3, we need to replace r1y¯11 with j=1 y1j1; r2y¯22

Pr2 Pr3 Pr3 with j=1 y2j2; r3y¯31 with j=1 y3j1; and r3y¯32 with j=1 y3j2. We also make some other substitutions to simplify the algebra.

Let k31 = (r3 + r2)s11s22. (A.2)

Let k32 = r2s11s12. (A.3)

Let k22 = r3s11s12. (A.4)

2 Let k11 = (r3 + r2)s11s22 − r2s12. (A.5)

2 2 Let kdenom = (r3 + (r2 + r1)r3 + r1r2)s11s22 − r1r2s12. (A.6)

With these in place, we get

r r r r ! 1 X3 X3 X2 X1 µˆ |Ω, r , r , r = k y − k y + k y + k y 1 1 2 3 k 31 3j1 32 3j2 22 2j2 11 1j1 denom j=1 j=1 j=1 j=1 82

r r r ! 1 X3 X2 X1 = (k y − k y ) + k y + k y . (A.7) k 31 3j1 32 3j2 22 2j2 11 1j1 denom j=1 j=1 j=1

The y2j2 are i.i.d. univariate normal; the y1j1 are i.i.d. univariate normal; and the y3j are i.i.d bivariate normal, so the variance of the estimator is:

Var(ˆµ1|Ω, r1, r2, r3) = " r ! r ! r !# 1 X3 X2 X1 Var (k y − k y ) + Var k y + Var k y . k2 31 3j1 32 3j2 22 2j2 11 1j1 denom j=1 j=1 j=1 (A.8)

Var(y2j2) = s22 and Var(y1j1) = s11, so

Var(ˆµ1|Ω, r1, r2, r3) =

2 " r3 ! #  1  X Var (k y − k y ) + k2 r s + k2 r s . (A.9) k 31 3j1 32 3j2 22 2 22 11 1 11 denom j=1

To find the variance of the remaining sum requires the facts that y3j are i.i.d., and that Cov(y3j1, y3j2) = s12.

r ! X3 Var (k31y3j1 − k32y3j2) = j=1 r X3 = Var(k31y3j1 − k32y3j2) j=1 r X3 = [Var(k31y3j1) + Var(k32y3j2) − 2k31k32Cov(y3j1, y3j2)] j=1

r3 X  2 2  = k31s11 + k32s22 − 2k31k32s12 j=1

 2 2  =r3 k31s11 + k32s22 − 2k31k32s12 . (A.10) 83

With that we can write the variance of the MLE,µ ˆ1.

Var(ˆµ1|Ω, r1, r2, r3) =  2 1  2 2  2 2  r3 k31s11 + k32s22 − 2k31k32s12 + k22r2s22 + k11r1s11 . (A.11) kdenom

Substituting the values for the k’s back in gives:

Var(ˆµ1|Ω, r1, r2, r3) = 2 3 2 2 2 2 2 2  2 2 2 2 2 r3 (r3 + r2) s11s22 + r2s11s12s22 − 2r2(r3 + r2)s11s12s22 + r2r3s11s12s22 + r1s11((r3 + r2)s11s22 − r2s12) 2 2 2 ((r3 + (r2 + r1)r3 + r1r2)s11s22 − r1r2s12) (A.12)

Symmetry also gives the variance ofµ ˆ2 given Ω.

Var(ˆµ2|Ω, r1, r2, r3) = 2 3 2 2 2 2 2 2  2 2 2 2 2 r3 (r3 + r1) s22s11 + r1s22s12s11 − 2r1(r3 + r1)s22s12s11 + r1r3s22s12s11 + r2s22((r3 + r1)s22s11 − r1s12) 2 2 2 ((r3 + (r2 + r1)r3 + r1r2)s11s22 − r1r2s12) (A.13)

∗ A.2 Variance of Simple Location Estimator, µ   ∗ ∗ µ1 Our simple estimator for the location is µ =   . Here we concern ourselves with  ∗  µ2 ∗ ∗ Var(µ1) and then rely on symmetry to arrive at the variance of µ2.

" r1 r3 # ∗ 1 X X µ = y + y (A.14) 1 r + r 1j1 3j1 1 3 j=1 j=1

" r1 r3 #! ∗ 1 X X Var(µ ) = Var y + y 1 r + r 1j1 3j1 1 3 j=1 j=1 r r ! 1 X1 X3 = Var y + y (r + r )2 1j1 3j1 1 3 j=1 j=1 1 = 2 (r1s11 + r3s11) (r1 + r3) 84

s = 11 . (A.15) r1 + r3 ∗ s22 By symmetry, Var(µ2) = . (A.16) r2 + r3 85

Appendix B

Serum Amyloid A Data

Table B.1: Serum Amyloid A data from non-diabetic subjects

2.2.RS 2.2.R 2.2 1.1.RS 1.1.R 1.1 1 0.00000 0.00000 0.00000 0.19138 0.41765 0.39097 2 0.05060 0.05060 0.09889 0.09750 0.29552 0.40689 3 0.00000 0.00000 0.00000 0.11556 0.49448 0.38996 4 0.00000 0.16030 0.15074 0.16384 0.29476 0.23036 5 0.04055 0.07727 0.08383 0.10559 0.30823 0.38452 6 0.00000 0.00000 0.00000 0.17088 0.36370 0.46543 7 0.07418 0.13833 0.13090 0.10278 0.28911 0.26471 8 0.00000 0.00000 0.00000 0.13591 0.55721 0.30688 9 0.00000 0.10344 0.09890 0.11038 0.33014 0.35713 10 0.00000 0.00000 0.00000 0.15560 0.39481 0.44959 11 0.00000 0.16896 0.16048 0.16472 0.26723 0.23860 12 0.00000 0.00000 0.00000 0.15971 0.41886 0.42143 13 0.07019 0.13471 0.11589 0.12410 0.31928 0.23583 14 0.00000 0.11941 0.11541 0.12608 0.33756 0.30153 15 0.02804 0.08099 0.09046 0.10244 0.33400 0.36407 16 0.00000 0.00000 0.00000 0.19224 0.39187 0.41590 17 0.00000 0.00000 0.00000 0.16759 0.32348 0.50894 18 0.08165 0.13159 0.15458 0.11474 0.23444 0.28300 19 0.00000 0.00000 0.00000 0.16351 0.40746 0.42904 20 0.00000 0.00000 0.00000 0.16932 0.45161 0.37906 21 0.00000 0.00000 0.00000 0.11969 0.48134 0.39897 22 0.00000 0.00000 0.00000 0.15394 0.46898 0.37709 23 0.00000 0.00000 0.00000 0.10018 0.39315 0.50668 24 0.00000 0.00000 0.00000 0.10492 0.40297 0.49211 25 0.09408 0.12163 0.12632 0.12954 0.23300 0.29543 26 0.09277 0.12457 0.12890 0.12428 0.25491 0.27457 27 0.03589 0.08206 0.08910 0.14846 0.31425 0.33024 28 0.00495 0.07137 0.11312 0.05578 0.25062 0.50416 29 0.11840 0.14624 0.15392 0.13600 0.21248 0.23296 30 0.00000 0.00000 0.00000 0.16150 0.51424 0.32426 31 0.07513 0.12883 0.10800 0.15040 0.33045 0.20719 32 0.05222 0.09142 0.07984 0.11268 0.40129 0.26255 33 0.02745 0.06542 0.07594 0.08792 0.35919 0.38408 34 0.00000 0.00000 0.00000 0.11102 0.52521 0.36376 35 0.00000 0.00000 0.00000 0.25579 0.43655 0.30766 36 0.00000 0.00000 0.00000 0.14961 0.47864 0.37175 37 0.09826 0.14659 0.16777 0.11863 0.20172 0.26703 38 0.00000 0.06267 0.07833 0.07618 0.28321 0.49961 39 0.08001 0.10035 0.10891 0.12149 0.24538 0.34386 40 0.00000 0.00000 0.00000 0.12830 0.29730 0.57440 41 0.00000 0.00000 0.00000 0.15358 0.42449 0.42193 42 0.00000 0.09917 0.12131 0.09670 0.27439 0.40843 43 0.04292 0.10368 0.09036 0.09780 0.37240 0.29283 44 0.00000 0.08851 0.09518 0.17234 0.33376 0.31021 45 0.00000 0.00000 0.00000 0.14078 0.43265 0.42656 46 0.00000 0.00000 0.00000 0.16310 0.37919 0.45772 47 0.05611 0.12447 0.12291 0.09185 0.29860 0.30606 48 0.00000 0.06622 0.09424 0.06539 0.24451 0.52964 49 0.00000 0.00000 0.00000 0.15508 0.33885 0.50607

Table B.2: Serum Amyloid A data from diabetic subjects

2.2.RS 2.2.R 2.2 1.1.RS 1.1.R 1.1 1 0.00000 0.00000 0.00000 0.20456 0.40643 0.38901 2 0.00000 0.07385 0.10245 0.08359 0.29608 0.44403 3 0.05370 0.10561 0.12317 0.11286 0.24184 0.36282 4 0.08874 0.11629 0.10494 0.13493 0.30592 0.24919 5 0.00000 0.08473 0.09056 0.10706 0.31535 0.40230 6 0.00000 0.08647 0.08979 0.13664 0.32533 0.36177 7 0.00000 0.00000 0.00000 0.15153 0.38953 0.45894 8 0.00000 0.00000 0.00000 0.12304 0.41041 0.46655 9 0.04993 0.12623 0.12344 0.09015 0.28336 0.32690 10 0.00000 0.00000 0.00000 0.10799 0.36697 0.52504 11 0.04280 0.08423 0.11315 0.08638 0.23823 0.43522 12 0.05909 0.09124 0.08777 0.10637 0.29609 0.35944 86

13 0.07005 0.14511 0.17272 0.08914 0.20497 0.31801 14 0.00000 0.00000 0.00000 0.12439 0.40576 0.46985 15 0.00000 0.00000 0.00000 0.17634 0.31248 0.51118 16 0.05397 0.12186 0.17570 0.08480 0.22258 0.34108 17 0.00000 0.00000 0.00000 0.18621 0.39165 0.42214 18 0.09033 0.17061 0.18005 0.10297 0.20373 0.25231 19 0.00000 0.11138 0.12087 0.11924 0.27913 0.36938 20 0.00000 0.00000 0.00000 0.12672 0.39444 0.47884 21 0.00000 0.00000 0.00000 0.13953 0.41600 0.44447 22 0.00000 0.12921 0.12945 0.11467 0.29527 0.33139 23 0.00000 0.00000 0.00000 0.10632 0.39665 0.49702 24 0.00000 0.00000 0.00000 0.15498 0.40812 0.43690 25 0.00000 0.00000 0.00000 0.17136 0.42262 0.40602 26 0.06465 0.10699 0.10675 0.09980 0.25237 0.36944 27 0.00000 0.06235 0.07788 0.08888 0.28552 0.48537 28 0.00000 0.08596 0.08317 0.14922 0.33387 0.34778 29 0.04882 0.11472 0.11201 0.11701 0.32480 0.28264 30 0.00000 0.07578 0.09145 0.09289 0.25586 0.48402 31 0.00000 0.00000 0.00000 0.08848 0.36362 0.54790 32 0.00000 0.00000 0.00000 0.10489 0.37262 0.52249 33 0.04415 0.10964 0.10476 0.08491 0.29516 0.36139 34 0.00000 0.00000 0.00000 0.15813 0.38768 0.45419 35 0.00000 0.00000 0.00000 0.11056 0.43472 0.45472 36 0.06026 0.10465 0.11259 0.14533 0.26188 0.31528 37 0.05135 0.08300 0.07831 0.12028 0.32192 0.34513 38 0.00000 0.07886 0.09396 0.11896 0.28249 0.42572 39 0.00000 0.00000 0.00000 0.20550 0.37739 0.41711 40 0.00000 0.07483 0.09027 0.08865 0.29565 0.45060 41 0.00000 0.00000 0.00000 0.11920 0.28985 0.59095 42 0.04793 0.14933 0.13476 0.07314 0.28497 0.30988 43 0.03238 0.08073 0.09662 0.06972 0.23577 0.48477 44 0.04485 0.12984 0.08599 0.10552 0.39193 0.24187 45 0.04456 0.10173 0.11272 0.12838 0.23906 0.37355 46 0.00000 0.00000 0.00000 0.14404 0.50472 0.35124 47 0.00000 0.00000 0.18633 0.14465 0.27184 0.39718 48 0.00000 0.00000 0.00000 0.19595 0.41956 0.38450 49 0.03469 0.08182 0.09942 0.08081 0.29554 0.40772 50 0.00000 0.10654 0.11013 0.13115 0.28710 0.36507 51 0.00000 0.09768 0.11622 0.09313 0.28860 0.40437 52 0.05342 0.11230 0.11775 0.12211 0.28867 0.30575 53 0.07499 0.15610 0.17913 0.10447 0.20700 0.27832 54 0.06551 0.12012 0.16277 0.10524 0.21257 0.33379 55 0.00000 0.11756 0.10683 0.12093 0.37645 0.27823 56 0.03210 0.06580 0.08980 0.08820 0.29226 0.43184 57 0.00000 0.14921 0.14105 0.15447 0.29789 0.25737 58 0.00000 0.00000 0.00000 0.23962 0.40036 0.36002 59 0.00000 0.00000 0.00000 0.10586 0.44427 0.44987 60 0.05824 0.14153 0.15668 0.08978 0.26451 0.28925 61 0.00000 0.00000 0.00000 0.20869 0.37746 0.41385 62 0.00000 0.00000 0.00000 0.16049 0.44320 0.39631 63 0.06502 0.14645 0.14645 0.11019 0.26291 0.26899 64 0.00000 0.00000 0.00000 0.13097 0.32533 0.54370 65 0.05507 0.10058 0.09153 0.11240 0.37641 0.26402 66 0.04458 0.08857 0.09585 0.11057 0.30513 0.35530 67 0.05709 0.11554 0.14119 0.08966 0.23709 0.35944 68 0.00000 0.00000 0.00000 0.19701 0.43279 0.37019 69 0.00000 0.00000 0.00000 0.13275 0.37184 0.49541 70 0.06796 0.10935 0.12230 0.10577 0.27712 0.31749 71 0.02977 0.08524 0.09851 0.09155 0.32140 0.37353 72 0.06661 0.14855 0.12752 0.10453 0.29139 0.26139 73 0.09243 0.14642 0.16243 0.11576 0.22237 0.26058 74 0.14779 0.15930 0.16851 0.15884 0.18232 0.18324 75 0.00000 0.00000 0.00000 0.12338 0.38231 0.49430 76 0.06111 0.11792 0.11863 0.10400 0.31387 0.28446 77 0.00000 0.00000 0.00000 0.11330 0.40563 0.48107 78 0.06716 0.17481 0.12307 0.11259 0.30639 0.21598 79 0.00000 0.00000 0.00000 0.23092 0.32457 0.44452 80 0.03355 0.07693 0.10191 0.08707 0.28410 0.41643 81 0.12351 0.14044 0.14583 0.14159 0.22239 0.22624 82 0.04755 0.12075 0.12096 0.09778 0.31166 0.30130 83 0.06277 0.11647 0.12010 0.10785 0.29648 0.29632 84 0.07556 0.11014 0.11903 0.12000 0.33372 0.24155 85 0.00000 0.00000 0.00000 0.13201 0.35469 0.51330 86 0.09261 0.12922 0.15899 0.12261 0.20529 0.29128 87 0.10679 0.13407 0.13041 0.13540 0.27013 0.22322 88 0.00000 0.11424 0.12228 0.12110 0.26750 0.37488 89 0.00000 0.00000 0.00000 0.15258 0.28419 0.56323 90 0.00000 0.13281 0.13346 0.12202 0.28819 0.32352 91 0.04474 0.08720 0.09520 0.11600 0.29526 0.36159 92 0.04218 0.13193 0.10392 0.07845 0.32652 0.31700 93 0.04605 0.13196 0.08120 0.10624 0.39241 0.24214 94 0.05253 0.09235 0.09954 0.12377 0.30583 0.32598 95 0.00000 0.00000 0.00000 0.13003 0.41973 0.45024 96 0.07234 0.09766 0.10713 0.10059 0.25853 0.36376 97 0.00000 0.00000 0.00000 0.07112 0.35855 0.57033 98 0.00000 0.00000 0.00000 0.20518 0.41228 0.38254 87

References

Agresti, A. (2002). Categorical Data Analysis. Wiley, second edition.

Aitchison, J. (1986). The statistical analysis of compositional data. Chapman & Hall, Ltd.

Aitchison, J. (1994). Principles of compositional data analysis, pages 73–81. In Anderson et al. (1994).

Aitchison, J. (1999). Logratios and natural laws in compositional data analysis. Mathematical Geology, 31(5), 563–580.

Aitchison, J. and Egozcue, J. J. (2005). Compositional data analysis:where are we and where should we be heading? Mathematical Geology, 37(7), 829–850.

Aitchison, J. and Kay, J. W. (2003). Possible solution of some essential zero prob- lems in compositional data analysis. In Proceedings of CoDaWork 2003, The First Compositional Data Analysis Workshop. Universitat de Girona. Departa-

ment d’Inform`atica i Matem`atica Aplicada. http://ima.udg.edu/Activitats/ CoDaWork03/paper_Aitchison_and_Kay.pdf.

Aitchison, J., Barcel’o-Vidal, C., and Pawlowsky-Glahn, V. (2002). Some comments on compositional data analysis in archaeometry, in particular the fallacies in tangri and wright’s dismissal of logratio analysis. Archaeometry, (44), 295–304.

Anderson, T. W., Olkin, I., and Fang, K., editors (1994). Multivariate analysis and its applications. Institute of Mathematical Statistics, Hayward, CA.

Billheimer, D., Guttorp, P., and Fagan, W. F. (2001). Statistical interpretation of 88

species composition. Journal of the American Statistical Association, 96(456),

1205–1214. http://dx.doi.org/10.1198/016214501753381850.

Butler, A. and Glasbey, C. (2008). A latent gaussian model for compositional data with zeros. Journal of the Royal Statistical Society: Series C (Applied Statistics), 57(5), 505–520.

Daunis-i-Estadella, J., Mart´ın-Fern´andez, J., and Palarea-Albaladejo, J. (2008). Bayesian tools for count zeros in compositional data sets. In Proceedings of CO- DAWORK’08, The 3rd Compositional Data Analysis Workshop, May 27–30, Uni- versity of Girona, Girona (Spain), CD-ROM .

Egozcue, J. (2009). Reply to “on the harker variation diagrams;. . . ” by J.A. Cortes. Mathematical Geosciences, 41(7), 829–834.

Egozcue, J. J. and Pawlowsky-Glahn, V. (2011). Basic Concepts and Procedures, pages 12–28. In Pawlowsky-Glahn and Buccianti(2011).

Fry, J. M., Fry, T. R., and Mclaren, K. R. (2000). Compositional data analysis and ze-

ros in micro data. Applied Economics, 32(8), 953–959. http://www.tandfonline. com/doi/pdf/10.1080/000368400322002.

Good, P. (2000). Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer.

Gourieroux, C., Monfort, A., and Trognon, A. (1984a). Pseudo maximum likelihood methods: Applications to poisson models. Econometrica, 52(3), pp. 701–720.

Gourieroux, C., Monfort, A., and Trognon, A. (1984b). Pseudo maximum likelihood methods: Theory. Econometrica, 52(3), pp. 681–700.

Kotz, S., Balakrishnan, N., and Johnson, N. L. (2004). Continuous multivariate distributions, models and applications, volume 1. John Wiley & Sons. 89

Lattin, J. M., Carroll, J. D., and Green, P. E. (2003). Analyzing multivariate data. Thomson Brooks/Cole Pacific Grove, CA.

Lehmann, E. L. (2009). Parametric versus nonparametrics: two alternative method- ologies. Journal of Nonparametric Statistics, 21(4), 397–405.

Leininger, T. J., Gelfand, A. E., Allen, J. M., and Silander Jr, J. A. (2013). Spatial re- gression modeling for compositional data with many zeros. Journal of Agricultural, Biological, and Environmental Statistics, 18(3), 314–334.

Mart´ın-Fern´andez, J. A., Palarea-Albaladejo, J., and Olea, R. A. (2011). Dealing With Zeros, pages 43–58. In Pawlowsky-Glahn and Buccianti(2011).

Mart´ın-Fern´andez, J. A., Hron, K., Templ, M., Filzmoser, P., and Palarea-Albaladejo, J. (2015). Bayesian-multiplicative treatment of count zeros in compositional data sets. Statistical Modelling, 15(2), 134–158.

Mullahy, J. (2010). Multivariate fractional regression estimation of econometric share models. Technical Report 16354, National Bureau of Economic Research. Avail-

able at: http://www.nber.org/papers/w16354, and http://works.bepress. com/johnmullahy/2.

Papke, L. E. and Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics, 11(6), 619–632.

Papke, L. E. and Wooldridge, J. M. (2008). Panel data methods for fractional response variables with an application to test pass rates. Journal of Econometrics, 145(1), 121–133.

Pawlowsky-Glahn, V. and Buccianti, A., editors (2011). Compositional Data Analysis: Theory and Applications. Wiley. 90

Pesarin, F. (2001). Multivariate Permutation Tests with Applications in Biostatistics. Wiley.

Romano, J. P. (1990). On the behavior of randomization tests without a group invariance assumption. Journal of the American Statistical Association, 85(411), 686–692.

Scealy, J. and Welsh, A. (2011). Regression for compositional data by using distribu- tions defined on the hypersphere. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 351–375.

Scealy, J. and Welsh, A. (2014). Colours and cocktails: Compositional data analysis 2013 lancaster lecture. Australian & New Zealand Journal of Statistics, 56(2), 145–169.

Stephens, M. A. (1982). Use of the to analyse continuous proportions. Biometrika, (69), 197–203.

Stewart, C. and Field, C. (2011). Managing the essential zeros in quantitative fatty acid signature analysis. Journal of Agricultural, Biological, and Environmental Statistics, 16(1), 45–69.

Stewart, C., Iverson, S., and Field, C. (2014). Testing for a change in diet using fatty acid signatures. Environmental and Ecological Statistics, 21(4), 775–792.

Trenchevska, O., Yassine, H. N., Borges, C. R., Nelson, R. W., and Nedelkov, D. (2016). Development of quantitative mass spectrometric immunoassay for serum amyloid a. Biomarkers, pages 1–34.

Yassine, H. N., Trenchevska, O., He, H., Borges, C. R., Nedelkov, D., Mack, W., Kono, N., Koska, J., Reaven, P. D., and Nelson, R. W. (2015). Serum amyloid a 91 truncations in type 2 diabetes mellitus. PloS one, 10(1). http://dx.doi.org/10. 1371/journal.pone.0115320.