Bivariate Models for the Analysis of Internal Nitrogen Use Efficiency: Mixture Models as an Exploratory Tool

Isabel Munoz Santa (Masters of Applied Science in Biometrics)

A thesis submitted for the degree of Masters of Applied Science in Biometrics in the University of Adelaide

School of Agriculture Food and Wine July 2014 ii Acknowledgments

I would like to express my deepest gratitude to my advisor Dr. Olena Kravchuck for her expertise, guidance, support and encouragement. This thesis would not have been possible without her. Thank you for your efforts to make me a better professional and give me the opportunity to study here! I would also like to thank my second supervisor Dr. Petra Marschner for her guidance in the biological aspects of my thesis and Dr. Stephan Haefele who kindly provided the data for the case study of this thesis and provided support in the interpretation of the results. I would like to acknowledge the Faculty of Science for providing the Turner Family Scholarship which supported this research and provided travel assistance to attend the Australian Statistical Conference, July 2014 Adelaide, Young Statistician Con- ference, February 2013 Melbourne, International Biometrics Society Conference, De- cember 2014 Mandurah and Australian Statistical Conference in conjunction with the Institute of Mathematical Statistics Annual Meeting, July 2014 Sydney. I would like to thank all the people at the Biometry Hub: Bev, David, Jules, Paul, Stephen and Wayne for their big smiles and for creating a positive working environ- ment as well as the wonderful group of statisticians at Waite, meeting regularly for the professional development discussions. Thank you to all the friends I have met in Adelaide, where I have had one of the best professional, cultural and personal experiences of my life. Thanks to Negar, Mohsen, Casey, Amanda, Rodrigo, Mariana, Fien, Daniela, Diego, Kanch, Antonio, Maria, Ruben, Alfonso, Lidia, Pablo, Ana, Antonija, Roey, Chris, Konrad, Diana, Luis and and Lorinda. Finally, with deep love and admiration, I thank my family for their love and support

iii from thousands of kilometres away and Martin for his support, love and contagious positive vision of life.

iv Contents

1 Motivations and thesis outline 1 1.1 Feeding the world requires an efficient use of nitrogen fertilisers . . . .1 1.2 Nitrogen efficiency measures ...... 5 1.3 Strategies for improving the uptake and utilisation of nitrogen by cereals8 1.4 Review of grain yield and nitrogen uptake analyses in agricultural research9 1.4.1 Studies selected for the review ...... 9 1.4.2 Amount and format of grain yield and nitrogen uptake data . . 12 1.4.3 Relationship between grain yield and nitrogen uptake ...... 12 1.4.4 Common methods of analysis of grain yield and nitrogen uptake field data and their limitations ...... 13 1.5 Objectives of the thesis ...... 16 1.6 Thesis outline ...... 18

2 Ratios of jointly normal variables 21 2.1 Introduction to the ratio of jointly normal variables ...... 21 2.2 Distribution of the ratio: history and properties ...... 23 2.2.1 Geary (1930) and Fieller (1932) expressions of the pdf . . . . . 23 2.2.2 Marsaglia (1965, 2006) expression of the pdf ...... 24 2.2.3 Pham-Gia et al. (2006) expression of the pdf ...... 27 2.3 Normal approximation of the pdf of the ratio ...... 30 2.4 Estimators of the ratio ...... 33 2.4.1 Point estimators: average of ratios, ratio of averages ...... 33 2.4.2 Confidence sets of the ratio of expected values ...... 35 2.5 On the distributional properties of internal nitrogen use efficiency in rice. 42

v 2.6 Summary ...... 45

3 Fundamentals of finite mixture models 47 3.1 Non-technical introduction ...... 47 3.2 Common use of mixture models ...... 51 3.3 Mathematical definition ...... 52 3.4 Classifying data into groups: label random vectors and posterior prob- abilities ...... 53 3.5 Maximum likelihood estimation of the mixture parameters ...... 54 3.6 The EM algorithm for the estimation of mixture parameters ...... 55 3.7 The EM algorithm for the estimation of parameters of mixtures of mul- tivariate Gaussian distributions ...... 57 3.7.1 Illustration of the EM algorithm on simulated data ...... 57 3.8 Difficulties in selecting the MLE of mixture models of Gaussian distri- butions with heteroscedastic components ...... 59 3.8.1 Unboundedness of the likelihood function ...... 60 3.8.2 Multiple local maxima ...... 61 3.8.3 Spuriosities ...... 62 3.9 Strategy to select the MLE of mixtures with heteroscedastic Gaussian components ...... 64 3.9.1 Starting strategies for the EM algorithm ...... 66 3.10 Bayesian approach to estimating parameters of mixture models of mul- tivariate Gaussian distributions ...... 67 3.10.1 The Gibbs sampler ...... 69 3.10.2 The Gibbs sampler for a mixture of multivariate Gaussian dis- tributions ...... 69 3.10.3 Label switching problem ...... 72 3.11 Selecting the number of mixture components ...... 74 3.11.1 Information Criteria ...... 74 3.11.2 Likelihood ratio test for selecting the number of clusters . . . . 76 3.12 Summary ...... 78

vi 4 Bivariate models for internal nitrogen use efficiency: mixture models as an exploratory tool 81

5 Conclusions and future lines of research 111

Appendix A List of studies in the review 117

Appendix B Journal information of the studies in the review 131

Appendix C Equivalence between Pham-Gia et al. (2006) and (Marsaglia, 1965, 2006) expressions of the pdf of the ratio 133

Appendix D Application of the EM algorithm for estimating the param- eters of a mixture of multivariate Gaussian distributions 135

Appendix E R code for fitting mixtures models of univariate Gaussian distributions 141

Appendix F R code for fitting mixture models of bivariate Gaussian distributions 153

vii viii Abstract

Ratios are commonly used among plant and soil scientists, in particular to express the plant nutrient utilisation efficiency of macro- and micro-nutrients. The internal nutri- ent efficiency can be understood in terms of maximising yield per a unit of nutrient in the plant. At present, IEN data are usually collected from designed field trials where different treatments are applied (e.g. fertiliser treatments) and analysed by univariate linear mixed models. However, univariate linear models on the ratio do not maintain information on the original traits, including their correlation, which presents a chal- lenge when interpreting the effect of agronomic practices or environmental conditions on the process of nutrient conversion into grain. Moreover, the distributional proper- ties of ratios do not comply with the assumptions of these linear models favoured in the area of soil and plant science research. A more suitable approach is to collect the traits of interest and to use bivariate analyses. These analyses preserve the information on the original traits and avoid issues associated with the ratio distributional properties.

If the data comes from field studies, different experimental and environmental con- ditions may lead to the presence of patterns (groups) in the data in addition or con- currently with designed treatments. Researchers in plant and soil sciences may be interested in identifying those conditions, for example to understand the nature of genotype-by-environment interactions. The inspection of the groups may reveal the factors defining them, thus gaining insight into the experimental or environmental drivers of the biological traits. Among bivariate analyses, bivariate mixture models of Gaussian distributions are an appropriate methodology for identifying clusters in the nutrient efficiency data, assuming that the traits are jointly normal. Studying this methodology for the analysis of the internal nitrogen use efficiency traits is the focus

ix of the present thesis.

The application of bivariate mixture models is suggested here as a complementary analysis to bivariate mixed models in designed field trials and for exploratory purposes only. The exploratory and supplementary character of the mixture analysis is due to the potential violation of the independence assumption when the data are collected from designed field trials.

In this project, bivariate mixed and mixture models are applied to a real-life de- signed field trial on non-irrigated rice in Thailand for the analysis of grain yield (GY ) and plant nitrogen uptake (NU) data. The univariate counterparts of these analyses are also applied on the ratio of these two traits (the internal nitrogen use efficiency). The advantages of the bivariate analyses are discussed in comparison to the univari- ate analyses on the ratio. In this case study, the bivariate mixture approach revealed that soil water availability post-flowering and N supply in soil are the potential factors defining the mixture groups.

The present work can be readily extended to the analysis of other similar traits in agriculture when the objective is to explore potential environmental conditions affecting the traits under study. In order to fully exploit the proposed methodology, field survey is suggested as a more appropriate sampling procedure for the application of mixture models than collecting data from designed field trials.

x Declaration of originality

I certify that this work contains no material which has been accepted for the award of any other degree or diploma in my name in any university or other tertiary institution and, to the best of my knowledge and belief, contains no material previously published or written by another person, except where reference has been made in the text. In addition, I certify that no part of this work will, in the future, be used in a submission in my name for any other degree or diploma in any university or other tertiary insti- tution without the prior approval of the University of Adelaide and where applicable, any partner institution responsible for the joint award of this degree.

I give consent to this copy of my thesis, when deposited in the University Library, being made available for loan and photocopying, subject to the provisions of the Copy- right Act 1968.

I also give permission for the digital version of my thesis to be made available on the web, via the University digital research repository, the Library Search and also through web search engines, unless permission has been granted by the University to restrict access for a period of time.

xi xii Common abbreviations in this thesis

BVN bivariate χ2 chi-square distribution CV coefficient of variation ρ correlation coefficient

σxy covariance between random variables X and Y Cov() covariance operator cdf cumulative distribution function

CVx CV of a X p dimension of a random vector D Dirichlet distribution distributed as ∼ EM Expectation and Maximisation

µx expected value of a random variable X µ expected value of a random vector expected value of the i-th component of a mixture of µi multivariate normal distributions E() expected value operator exp exponential function {} F F-distribution Γ gamma function GY grain yield if only if ↔ ⇔ Gi i-th group of a mixture of distributions

xiii i.i.d. independent and identically distributed

IEN internal nitrogen use efficiency L() log likelihood function MLE maximum likelihood estimate π mixing proportions ψ mixture parameters Mult multinomial distribution MVN multivariate normal distribution NU nitrogen uptake g number of groups in a mixture

τij posterior probabilities of the j-th observation in Gi pdf probability density function proportional ∝ tr trace of a matrix n sample size Φ standard univariate normal cdf ϕ standard univariate normal pdf S straw yield T student’s t-distribution N univariate normal σ variance

σx variance of a random variable X Σ variance-covariance matrix variance-covariance matrix of the i-th component of a Σi multivariate normal distributions V ar() variance operator

θi vector of parameters of the i-th component W

xiv List of Tables

1.1 Causes of N losses and associated environmental impacts...... 3 1.2 Main Nitrogen Use Efficiency (NUE) indices...... 7

2.1 Conditions for the four scenarios of Fieller’s confidence sets ...... 38

B.1 Journals and their impact factor for studies in the review ...... 131

xv xvi List of Figures

1.1 Nitrogen pathway from soil to grain ...... 4 1.2 Distribution of the studies in the review by continents ...... 11 1.3 Distribution of the studies in the review by the year of publication (left) and the paper citation index (right)...... 12 1.4 Typical scatter plots of grain yield and nitrogen uptake in wheat . . . . 13

2.1 Different shapes of the probability density function of the ratio of two jointly normal variables ...... 27 2.2 Probability density function of the ratio of two jointly normal variables

for different values of the coefficient of variation of the denominator (CVx) 32 2.3 Probability density function of the ratio of two jointly normal variables

for different values of the coefficient of variation of the numerator (CVy)

but same CVx ...... 32 2.4 Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and Eq. 2.12 (right) ...... 33 2.5 Feasible cases of Fieller’s confidence set of the ratio of expected values . 37 2.6 Construction of a wedge given the confidence interval of the ratio of means (left) and vice versa (right) ...... 40 2.7 Confidence sets of the ratio of expected values in Von Luxburg & Franz (2009)...... 41 2.8 Grain yield versus nitrogen uptake from a sample of non-irrigated rice in northeast Thailand (Naklang et al., 2006) ...... 43 2.9 Probability density function of the ratio of two jointly normal variables with parameters given in Eq. 2.21 ...... 44

xvii 2.10 Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and Eq. 2.12 (right) for the data in Fig. 2.8 ...... 44

3.1 Solutions of three clusters obtained by the EM algorithm for the case study (Chapter 4) ...... 50 3.2 Spurious solution obtained by the EM algorithm when fitting 7 compo- nents to the case study data (Chapter 4)...... 50 3.3 Bimodal distribution generated from a mixture of three univariate nor- mal components ...... 52 3.4 Scatter plot of a sample from a mixture two bivariate normal ...... 58

E.1 Histogram of the internal nitrogen use efficiency data and the mixture components found by the EM algorithm initiated from random starts . 144 E.2 Histogram of the internal nitrogen use efficiency data and the mixture components found by the EM algorithm initiated from a partition pro- vided by the K-means algorithm ...... 145 E.3 Histogram of the internal nitrogen use efficiency data and the mixture components found by the EM algorithm initiated on a random subsample147 E.4 Histogram of the internal nitrogen use efficiency data and the mixture components found after running several short runs of the EM algorithm 149

F.1 Cluster partition found by the EM algorithm initiated from random starts159 F.2 Cluster partition found by the EM algorithm initiated from the partition obtained by the K-means algorithm ...... 162 F.3 Cluster partition found by the EM algorithm initiated from simulated means ...... 165 F.4 Cluster partition found by the EM algorithm initiated from the mixture estimates obtained by running the EM algorithm on a random subsample of 200 observations ...... 170 F.5 Cluster partition found by the EM algorithm initiated from the mixture estimates after running several short runs of the EM algorithm . . . . . 179

xviii Chapter 1

Motivations and thesis outline

1.1 Feeding the world requires an efficient use of nitrogen fertilisers

The world is experiencing a rapid increase of its population, from the current 7.2 bil- lion to the expected 9.6 billion by 2050 (UN, 2013). This growth trajectory implies an increase in the demand for cereals– for their direct consumption as the main staple food, and indirectly, since the demand for meat is also increasing and cereals are used for feeding animals (Cassman et al., 2002). According to the FAO (2009), to feed the expected 9.6 billion of people will require producing one additional billion tonnes of cereals per year. However, the grain production is constrained by environmental con- ditions, especially by the availability of water and nitrogen (N) (Andrews et al., 2009).

Nitrogen is, after carbon (C), the major nutrient required by plants in largest quan- tities (Marschner & Marschner, 2012, p.135). Plants take up N from soil, mainly as nitrate and ammonium, to produce their proteins and nucleotides (Xu et al., 2012). A fraction of this N is stored in the grain and removed from the field at harvest. If additionally the straw is removed, the N depletion of the soil is greater.

To replace this nutrient and maintain the soil fertility status, N fertilisers are ap- plied to provide plant nutrients in addition to the indigenous N in the soil. Nitrogen fertilisers have been essential for the steady increase of the production of grain over the last decades (Follett, 2001; Hirel et al., 2007), which explains the high demand for

1 this commodity ( . 169 Tg/year, (Smil, 1999)). ≈

Not all the N applied to soil is taken up by plants. Although variable across coun- tries (Table 5 in Bouwman et al. (2005)), it is estimated that more than half of the N fertiliser applied is lost into the environment (Tilman et al., 2002). The losses of N are due to the fact that nitrates are not easily retained by the soil matrix (Olarewaju et al., 2009) and can be transported into water bodies through leaching and surface runoff (Raun & Johnson, 1999). Additionally, N can be lost to the atmosphere through deni- trification, ammonia volatilization and gaseous emission from leaves (Fageria & Baligar, 2005).

Nitrogen losses result in large economic, energy and environmental costs. The eco- nomic loss is estimated 15.9 billion US dollars per year (Raun & Johnson, 1999). The energy cost is related to the fertiliser manufacturing process (Haber-Bosch) in which high pressure and temperature are needed to convert N2 in the atmosphere into am- monia (NH3) (Galloway et al., 2003). Finally, the environmental impacts (Table 1.1) are related to the pollution of soil, water and atmosphere (Ladha et al., 2005).

Minimising the losses of N is thus essential to satisfy the economic and popula- tion growth needs without compromising the future of our planet. This is the base of sustainable agriculture (Tilman et al., 2002), stated by Spiertz (2010) as the 3-P principle, which consists of meeting ‘population’, ‘profit’ and ‘planet’ requirements. To meet the 3-P principle, it is necessary to be efficient by achieving the maximum possible yields with an optimum and responsible use of fertilisers and natural resources. To quantify the N use efficiency, scientists have defined a variety of indices (Table 1.2) and identified strategies to increase them (Section 1.3). However, the expected effects of these strategies are not always achieved since the N utilisation depends on a myriad of complex interactions between plant, soil, climate and microorganisms affecting the physiological processes in plants (Figure 1.1).

2 Table 1.1: Causes of N losses and associated environmental impacts.

Cause of N loss Description Environmental impact

Denitrification Conversion of nitrate (NO3−) into N Global warming and atmosphere

gases (N2, NO, N2O) ozone depletion

Ammonia volatilization Conversion of urea (CO(NH2)2) Acid rain, which leads to soil acidi-

into N gases (NH3) fication.

Gaseous plant emission Release of NH3 from leaves Acid rain, which leads to soil acidi- fication.

Leaching N is transported downward after ir- Groundwater contamination and

rigation or strong precipitations eutrophication∗

Surface runoff Water flows over the soil surface Eutrophication and moves N into streams and lakes

∗An excessive amount of nitrates in water results in a great increase of algae causing oxygen depletion and putrefaction of the water. Information has been compiled from Fageria & Baligar (2005); Ladha et al. (2005); Raun & Johnson (1999).

3 N in soil

Total amount of N in soil: indigenous N and external sources from organic and inorganic fertilisers.

available N

N potentially taken up by plants. The N availability is subjected to a range of factors such as the amount and source of N in soil, the availability of other nutrients (e.g. C:N ratio), the activity of microorganisms (mineralization and inmobilization processes), climate and soil characteristics (e.g. pH and moisture content).

N absorbed

N taken up by roots. N absorption depends on the availability of nutrients, climate factors (e.g. availability of water), the crop demand for nutrients which varies with the type of crop, the growth stage of the plant, the plant genotype and the root distribution.

N taken up

Concentration of N multiplied by the dry weight of plants. It differs from N absorbed due to leaf senesce and fall, insect attacks, the root turnover and gaseous plant emissions through the plant canopy.

N in grain

Concentration of N in grain multiplied by the grain yield. It can be affected by temperature and drought stresses, variety, diseases etc.

Figure 1.1: Nitrogen pathway from soil to grain. [Compiled from Fageria & Baligar (2005); Marschner & Marschner (2012); Marschner (2013, pers. comm., 3 January); Xu et al. (2012)]

4 1.2 Nitrogen efficiency measures

Nitrogen in soil has to undergo different steps before being used in grain production (Figure 1.1). To quantify the efficiency of N utilisation in these steps, plant and soil scientists have suggested several Nitrogen Use Efficiency (NUE) indices. Table 1.2 dis- plays the most common NUE indices found in plant and soil science literature.

Among various NUE indices, this thesis focuses on the statistical analysis of internal

N use efficiency, IEN (Eq. 1.1). This index measures the ability of plants to convert N content in aboveground biomass into grain yield.

GY IE = (1.1) N NU where GY (kg/ha) is grain yield and NU (kg/ha) is N uptake – content of N in aboveground plant parts. In field trials, NU is commonly measured (e.g. Naklang et al., 2006) as follows1:

m[N] GY + [N] S NU = G S (kg/ha) (1.2) 1000 where [N]G is N concentration in dry grain (g/kg), [N]S is N concentration in straw (g/kg), GY is grain yield (kg/ha), S is straw yield (kg/ha) and m is the standard moisture correction factor, equal to 0.86.

Internal nitrogen use efficiency can be directly linked to another important agricul- tural efficiency measure; the so-called harvest index, HI, (Eq. 1.3)

GY HI = (1.3) GY + S

The pressure of the last decades to increase GY has made HI an important trait for variety selection in plant breeding. Although HI emphasises the partition of carbon (C) in plant (Sinclair, 1998), the importance of N for grain production induces a close

1Notice that no direct measures of NU are available. However, the fact that NU is a derived variable from others is not investigated in this thesis.

5 association between HI and IEN (Sinclair, 1998). In particular, by Eq. 1.1, 1.2 and 1.3, the following relationship is derived:

1 IE−1 = [m[N] + [N] HI−1 1 ] N 1000 G S − 

6 Table 1.2: Main Nitrogen Use Efficiency (NUE) indices.

NUE index Definition Formula Partial factor Ratio of the total grain yield to the level of N productivity fertiliser applied. GY 1 PFPN = (kg kg− ) (PFPN ) FN

Agronomic effi- Ratio of the increment in grain yield to the

ciency (AEN ) level of N fertiliser applied. It differs from (GY G0) 1 AEN = − (kg kg− ) PFPN in that it provides information about FN the increase of grain due to fertilising.

Recovery effi- Ratio of the increment of N uptake to the level

ciency (REN ) of N fertiliser applied. It measures the ability (NU N0) 1 REN = − (kg kg− ) of plants to take up N from applied fertilisers. FN

Internal N use Ratio of the total grain yield to the total N up-

efficiency (IEN ) take. It measures the ability of plants to utilise GY 1 IEN = (kg kg− ) N content in biomass for producing grain. NU

Physiological ef- Ratio of the increment in grain yield to the

ficiency (PEN ) increment in N uptake due to fertilising. It (GY G0) 1 PEN = − (kg kg− ) measures the ability of plants to utilise the in- (NU N0) − crement in N uptake due to fertilising to pro- duce grain.

NUE: Nitrogen Use Efficiency. GY : grain yield. FN : level of fertiliser applied. G0: grain yield in plots with no fertiliser (indigenous N only). NU: N uptake measured in aboveground biomass. N0:

N uptake measured in aboveground biomass in a plot with no fertiliser. Compiled from Dobermann

(2005).

7 1.3 Strategies for improving the uptake and utilisation of ni- trogen by cereals

Maximising the production of grain and optimising the use of N fertiliser are major priorities for soil and plant scientists. In the last 70 years, substantial effort has been undertaken to develop strategies which enable an increase in the uptake and utilisation of N for grain production (Evenson & Gollin, 2003). These strategies have been focused on the improvement of agronomic practices and plant genotypes (Cassman et al., 2002; Raun & Johnson, 1999; Spiertz, 2010; Tilman et al., 2002):

Improvement of agronomic practices: better matching of the fertiliser ap- • plication to the crop demand, crop choice, crop rotation with legumes, utilisation of slow release fertilisers, conservation tillage systems, cover crops with legumes, utilisation of organic compost and green manures, water management and sowing time management (Fageria & Baligar, 2005). These practices may differ depend- ing on the climatic region. For instance, in arid regions the amount of N applied is lower than in wetter regions because plants tend to grow less. Furthermore, in wetter regions N loss via leaching or denitrification is high so more N is applied although in small but frequent doses (Marschner 2014, pers. comm., 31 October) .

Improvement on plant genotypes: plant breeding or genetic engineering • (see Marschner & Marschner (2012, Section. 6.1.6)). At present, several fam- ily of genes which can potentially increase NUE have been identified in rice (Vinod & Heuer, 2012) and wheat (Foulkes et al., 2009; Hawkesford, 2014). How- ever, these genotypes have been mostly tested in laboratory conditions and there are not enough field studies to assess the genotype by environment interactions (Hawkesford, 2014). The need of more research and the opposition of part of the population to adopt genetically modified food makes this strategy a long-term prospect.

8 Improving N efficiency has been also a priority at the Waite Campus where exten- sive research across disciplines has been undertaken to develop strategies for increasing the efficiency in nitrogen utilisation by cereals (e.g. Australian Centre for Plant Func- tional Genomics, 2014; Waite Institute, 2014a; Waite Institute, 2014b). A comprehen- sive discussion about the best approaches to increase the efficiency is beyond the scope of this thesis. This thesis is focused on implementing new statistical methods which can provide more insight into the conversion of NU into GY and how this conversion is affected by environmental factors. The identification of such factors can help plant and soil scientists to better understand the conversion process of NU into GY and to develop specific management practices for increasing IEN .

With these objectives in mind, a literature search for current statistical methods in the analysis of GY and NU in published agricultural research was carried out. The objectives of this literature search were to determine: 1) the amount and format of published GY and NU data, 2) typical trends between the two traits and 3) the current most common statistical techniques employed to analyse these data.

1.4 Review of grain yield and nitrogen uptake analyses in agri- cultural research

1.4.1 Studies selected for the review

1.4.1.1 Searching criteria

A selection of 100 studies (see Appendix A) from plant and soil science literature was collected using the search engines of Web of Science and Google Scholar and employing the following three searching criteria:

1. A combination of keywords such as yield, nitrogen, uptake, internal efficiency, physiological efficiency, utilisation efficiency, rice or wheat were used for topic searching e.g. yield and nitrogen and (rice or wheat). The searching was restricted by research affiliations; in particular, to the International Rice Research Institute,

9 Indian Agricultural Research Institute and all the Australian affiliations (CSIRO, University of Adelaide, etc.). These are world-wide recognised institutions in cereal research; hence, a source of reliable data. Furthermore, studies performed before 1980 were not included in the selection.

2. Articles which were referenced by the previously selected articles and articles which cited them. These manuscripts were included to widen the searching frame- work.

3. Studies containing both GY and NU data and performed in the field (farmers or research stations fields).

1.4.1.2 Main features of the studies selected

In this section, I present the most important features of the studies evaluated in rela- tion to: 1) journals in which the studies were published, 2) places where the studies were conducted and 3) year of publication and manuscript citation index.

Most of the journals in which the studies were published belong to the category of Agronomy followed by Agriculture Multidisciplinary (Table B.1). Within these cate- gories, an impact factor close or higher than three is considered to be high. Therefore, 51% of the studies selected are considered to be published in high impact factor jour- nals (Table B.1): Agriculture, Ecosystems and Environment (1%), Molecular Breeding (1%), European Journal of Agronomy (8%), Plant and Soil (13%) and Field and Crops Research (28%). These journals are placed in the top ten of their categories for a total of 78 journals in Agronomy, 57 in Agriculture Multidisciplinary and 195 in Plant Science.

Most of the studies selected were carried out in Asia (72%); mainly China (30%), India (17%) and Philippines (10%). About 10%, 9%, 4% and 3% of the studies se- lected were performed in Oceania, Europe, America and Africa, respectively. The 2% of the studies were carried out in several countries belonging to, at least, two continents

10 (Others), see Figure 1.2. This distribution is explained by our searching criteria, which gave priority to well-recognised Asian and Australasian institutions.

The majority of the studies were published in the last 8 years (Figure 1.3, left). This partially explains why 31% of the manuscripts have been cited less than 3 times and 59% have less than 20 cites (Figure 1.3, right)

Figure 1.2: Distribution of the studies in the review by continents. [Others refers to studies which present experiments carried out in at least two countries from different continents.]

11 Figure 1.3: Distribution of the studies in the review by the year of publication (left) and the paper citation index (right).

1.4.2 Amount and format of grain yield and nitrogen uptake data

There is a vast amount of experimental data on GY and NU collected from designed field trials. The data are mostly presented in: 1) tables, which display the means of

GY , NU or IEN across treatments, or 2) scatter plots in which GY is plotted against NU. In the latter, least-square exponential or polynomial regressions are commonly fitted to model the trend of GY on NU (Figure 1.4, left). Alternatively, two straight envelope lines delimiting the points of the scatter plot are displayed (Figure 1.4, right)

1.4.3 Relationship between grain yield and nitrogen uptake

The trend of GY on NU usually follows a saturation curve. At low levels of NU, a small increment of the latter sharply increases GY ; however, this effect flattens out at high levels of NU (Figure 1.4, left). This pattern has been shown in Dobermann & Cassman (2002); Naklang et al. (2006); Witt et al. (1999). This trend shape is due to the fact that low levels of the N content in the plant limit GY (Xu et al., 2012). However, once

12 Figure 1.4: Typical scatter plots of grain yield and nitrogen uptake in wheat. [A curvilinear regression has been fitted in the graph to the left (Takahashi et al., 2007). The open and close points refer to plots with or without compost application, respectively. In the graph to the right, two lines delimiting the points of the scatter plot are presented (Liu et al., 2006). The circles and triangles refer to plots with and without fertiliser application, respectively] the plant has accumulated enough N, other factors (e.g. other major or minor nutrients, water or low rate of photosynthesis (Foulkes et al., 2009)) may be more limiting than N. The flattening effect is also produced because there is a maximum quantity of grain that plants can produce (Fowler, 2003; Otteson et al., 2007). If the amount of N in plant is excessive, N may have a negative effect decreasing the grain production (Goyal & Huffaker, 1984, Chapter. 6). Therefore, there is a change in the correlation between both variables GY and NU (Figure 1.4, left). The correlation reveals how N taken up is utilised for grain, and it changes depending on the environmental conditions, agronomy practices and plant genotypes. We will return to this issue later in Chapter 3 and 4.

1.4.4 Common methods of analysis of grain yield and nitrogen uptake field data and their limitations

The current most common techniques to analyse GY and NU data in agricultural science journals can be divided into 1) statistical techniques and 2) biological quantita-

13 tive models. The limitations of these techniques to provide additional understanding of the conversion of NU into GY are discussed below. The standard statistical software packages are also detailed.

Statistical techniques:

Least square regressions • This technique is used to model the trend of GY on NU. Around 16% of the stud- ies in our literature selection used simple least-square polynomial or exponential regressions. The main objective is to check if both variables are related and the type of relationship they have e.g. linear, curvilinear, linear-plateau. However, it is not clear if the observed scatter (e.g. Figure 1.4) is a direct functional of the response of GY to NU or/and an overlay of growth processes across different conditions. Furthermore, the least-square regressions of GY on NU ignore the external factors affecting the relationship between both variables which are of great interest to farmers and have to be taken into account to understand the utilisation of NU for GY . In addition, one of the necessary conditions for simple least-square regressions is that the variance along the fitted line is constant (ho- mocedasticity of the error). However, as observed in Figure 1.4 (left hand side), this requirement is not necessarily fulfilled.

Marginal analyses on GY and NU data • These analyses are mainly used to test for differences between the means of GY and NU across experimental treatments. Around 80% of the studies in our selection performed marginal analyses of GY and NU. In the majority of these studies the technique used is the univariate analysis of variance followed by a post-hoc analysis, such as the Least Significant Difference test or the Duncan’s Mean Range test. However, such analyses ignore the joint behaviour of the variables, which poses a serious limitation on getting an in-depth understanding of the conversion of NU into GY .

14 Analyses on IE • N These analyses are mainly applied to test if there is any statistical difference between the means of the ratio across treatments. Internal N use efficiency is computed at each sampling unit (at which NU and GY are measured) and com- monly analysed by the univariate analysis of variance. Since the distribution of the ratio of two jointly normal variables is a mixture2 of non-normal heavy- tailed distributions (Marsaglia, 1965, 2006), the analysis of the ratio may violate the requirements for non-abnormality and homogeneity of error variances; thus, inferential conclusions may not be reliable. Around 40% of the studies in our literature selection carried out analyses on the ratio. Just four of them (Albrizio et al., 2010; Giambalvo et al., 2010; Liu et al., 2011; T´etard-Joneset al., 2013) stated to have checked the deviation from the normality assumption and three (Albrizio et al., 2010; Giambalvo et al., 2010; Liu et al., 2011) the homogeneity of error variances.

Analyses of variance were mostly employed to assess the effect of different fertiliser treatments on GY , NU or IEN . However, N utilisation for grain production depends on the available N in soil rather than on the amount of N applied (Figure 1.1). The fraction of available N can substantially differ from the applied N due to complex envi- ronmental interactions between climate, soil and plant (Marschner & Marschner, 2012, p.315). Thus, plots receiving the same fertiliser treatment may present different levels of available N, resulting in non-uniform realisations of treatments (ill-defined). Thus, special care is required when designing such field trials as well as a high control on the agronomy practices to ensure treatments applications are as consistent as possi- ble. In addition, and in order to better understand the interactions of these fertiliser treatments with non-controlled environmental conditions in the field, complementary analyses to the ones commonly used (e.g. linear mixed models) may be required.

Biological quantitative models: Biological models like Quantitative Evaluation of Fertility Tropical Soil (QUEFTS)

2see Chapter 3

15 (Janssen et al., 1990) or crop growth simulations models (see Cassman et al. (1998, Section. 6.5) for a review) describe internal processes in plants and are based on the- oretical equations derived from physical principles. Statistics is used to validate such models. Consequently, these models are deterministic and beyond the scope of this thesis. In our literature selection, 8% of the studies used the QUEFTS model and 3%3 employed crop growth simulation models.

Statistical software The statistical software packages used to analyse GY and NU together with the per- centage of studies in our literature search employing them are listed as follows. Sta- tistical Analysis System, SAS, (34%) (SAS Institute, 2013), SPSS (8%) (SPSS, 1997), Genstat (6%) (VSN International, 2012), IRRISTAT (5%) (IRRI, 1994), Excel (2%) (Excel, 2010), R (2%)(R Core Team, 2012) and Others (2%). The rest (41 %) of the studies did not provide any details on the statistical sofware used. This percentage corresponds to studies which developed biological quantitative models (11%) or did not mention the software employed (30%).

1.5 Objectives of the thesis

Statistical techniques currently used in plant and soil science publications to model GY and NU data have serious limitations in contributing to a fundamental understanding of the conversion process of NU into GY . In this thesis, I argue that a better approach is to analyse GY and NU data jointly with bivariate analyses. For instance, bivari- ate linear mixed models on (GY, NU) are more appropriate for comparing treatments effects than univariate linear models on GY , NU or IEN . Recently, Ganesalingam et al. (2013) proposed bivariate mixed models as an alternative to univariate mixed model on a ratio for the analysis of plant survival data. As they showed, the bivariate approach preserved the information on the original variables, allowed modelling the spatial correlation for each trait, better utilised the experimental data and increased

3The percentages presented in this section does not sum up to 100% because some studies performed more than one type of analysis

16 the accuracy of predictions for variety survival.

In designed field experiments, both GY and NU can be greatly affected by non- controlled environmental conditions. These environmental conditions may often be confounded with designed factors or interact with them resulting in non-uniform treat- ments. This may complicate the interpretation of bivariate mixed model analyses. Our hypothesis is that GY and NU field data collected across a range of environments can reflect a heterogeneous population composed by subpopulations. Each of these sub- populations (clusters) can be considered a separate environment. The identification of such clusters and the close inspection of potential factors defining them can shed additional light on the nitrogen utilisation process.

Finite mixture models of bivariate Gaussian distributions can be used to identify such clusters and are proposed here as a complementary analysis to bivariate mixed models for data collected in field experiments. The main benefits envisaged for the analysis of the internal N efficiency traits are stated as follows. This approach 1) preserves information on GY and NU, 2) acknowledges the change in the correlation between GY and NU across environments (clusters), 3) avoids dealing with the ratio distribution, 4) allows estimation of IEN within each cluster 5) provides additional insight into potential environmental conditions affecting the mechanism of the N utili- sation for grain production. In terms of agronomic studies, the identification of clusters is useful to determine samples which belong to the same environment. The identifi- cation of the factors defining each of the environments could be useful to implement agronomic practices which counteract potential adverse environmental conditions. For instance, the lack of rain at a certain growth stage of the plant could be identified as one of the potential factors affecting N utilisation. To counteract drought stress in non-irrigated systems, farmers could cover the crops with straw to retain the moisture of the soil. The identification of these environmental conditions is also important to improve further experimental designs or to include them in further statistical models. For instance, to predict the best sowing time according to the local conditions of the

17 field. The objective of this thesis is to investigate the benefits of the bivariate mixture methodology as a complementary analysis of bivariate mixed models for field trials in the presence of strong environmental conditions. At this current stage, mixture models should not be applied alone due to the fact that GY and NU data are mostly collected from designed field trials. Data from such trials may violate the assumption of independence of mixture models4. Despite this limitation, I believe that the technique can be useful for identifying environmental factors which may overshadow, or interact with, treatment effects. Thus, the combination of bivariate mixed and mixture models can provide a more insightful interpretation of the conversion process of NU into GY in field trials.

1.6 Thesis outline

In this chapter I have presented the biological background and the motivation for this project. The remainder of the thesis is presented in four chapters.

Chapter 2 presents the distributional properties of ratios of jointly normal variables. The objective of this chapter is to draw the attention of the reader to the pitfalls of analysing ratios with simple statistical techniques. The expressions of the probability density function (pdf) of these types of ratios are reviewed highlighting the mixture nature and possible heavy-tailedness of their distribution. I also revise the main esti- mators of the ratio of expected values and properties. Finally, analytical and graphical procedures for calculating confidence sets of the ratio of expected values are reviewed.

Chapter 3 exposes the theoretical fundamentals of finite mixture models of bivari- ate Gaussian distributions and provides the theoretical framework of the methodology of this thesis. I revise the frequentist and Bayesian approaches for mixtures as well as the difficulties encountered when fitting mixture models of Gaussian distributions with heteroscedastic components and common recommendations for overcoming them.

4See Chapter 3

18 In Chapter 4, the methodology of bivariate mixture models together with bivariate mixed models for the analysis of (NU, GY ) is demonstrated for a field study reported in Naklang et al. (2006). The benefits of the bivariate analyses (mixture and mixed models) are discussed in comparison with their respective univariate counterparts on

IEN . This chapter is presented as a manuscript, following the required format for submission to the Australian and New Zealand Journal of Statistics.

Finally, Chapter 5 presents the general conclusions of the thesis. I argue that ideally the methodology of bivariate mixture models should be applied in field surveys to fully exploit the potential of the technique as a means for identifying environmental factors affecting NU and GY . Simulation studies to assess the coverage of Fieller’s confidence intervals (Fieller, 1954) of the ratio of expected values for each component of the mixture is proposed as a future line of research. Other research gaps identified during the development of the present project are discussed.

19 20 Chapter 2

Ratios of jointly normal variables

2.1 Introduction to the ratio of jointly normal variables

Ratios are commonly used in agricultural research to quantify one variable with respect to another. For instance, the efficiency of N utilisation by plants is quantified by sev- eral indices (Table 1.2). However, researchers in agriculture and biometricians are not always aware of the distributional properties of ratios and choose to apply simple sta- tistical methods, which assume normality and homocedasticity of error variance, on the ratio observations. Violations of these assumptions may lead to non-reliable inferences.

A ratio can take atypical large values if its denominator goes close to zero. This results in higher probabilities of having outliers and heavy-tailedness. In particular, if (X,Y ) are jointly normal, the probability density function (pdf) of the ratio R = Y/X is a mixture1 of two heavy-tailed distributions, one of the components being Cauchy distributed (Marsaglia, 1965, 2006)2.

The presence of a Cauchy component in the pdf of R results in the non-existence of the expected value, E(Y/X), and the variance, V ar(Y/X). Since E(Y/X) does not exist, E(Y )/E(X) is used for inference purposes (e.g. Lai et al., 2004). The confidence sets of E(Y )/E(X) comprise of asymmetric bounded intervals, unbounded intervals or the entire real line (Fieller, 1954; Von Luxburg & Franz, 2009). These solutions are

1See Chapter 3 2Further details on the study of Marsaglia (1965) were given in Marsaglia (2006)

21 derived in the following situations (Von Luxburg & Franz, 2009):

The denominator is far from zero. Then, there are no problems in computing the • ratio and the confidence sets of E(Y )/E(X) are bounded intervals.

The numerator is far from zero but the denominator is not. Then, the denomina- • tor can take positive or negative values, and E(Y )/E(X) can result in arbitrarily large values of unknown sign. In this case, the confidence sets are in the form of ] , q ] [q , [ with q < 0 and q > 0. The only possibility for E(Y )/E(X) − ∞ 1 ∪ 2 ∞ 1 2 to be in ]q1, q2[ is by having a small numerator or a large denominator, which is not possible in this assumed situation.

The denominator and the numerator can both take values close to zero. Then, • we have an indeterminacy of the type 0/0 and E(Y )/E(X) can take any value. Thus, the confidence set is the entire real line.

This chapter revises the distributional properties of the ratio R, where (X,Y ) is a jointly normally distributed (BVN) variable, with the following expected value (µ) and variance-covariance(Σ):

2 σx σxy µ = (µx, µy); Σ = (2.1)  2  σxy σy   The properties that will be reviewed in this chapter are readily applied to the IEN distribution. Recall from Section 1.2 that IEN is defined as the ratio of GY to NU. In field trials, GY and NU are measured at harvest and cumulatively on each experi- mental plot. Conditional on major factors and assuming that the plant measurements are weakly correlated, it can be assumed that at the plot level both GY and NU are normally distributed (see Central Limit Theorem in Cramer, 1946, p. 219). Under the same assumptions, the Central Limit Theorem can be extended to the bivariate case and one can consider (NU, GY ) to be jointly normal (see Cramer, 1946, p. 286).

The chapter is structured as follows. Firstly, I present the main results regarding the pdf of R and review the cases for which this distribution is approximately normal.

22 Then, I discuss the most typical estimators of E(Y )/E(X) and their performance. Finally, I review analytical and graphical procedures (Fieller’s rule and a geometric approach) for deriving the confidence sets of E(Y )/E(X).

2.2 Distribution of the ratio: history and properties

2.2.1 Geary (1930) and Fieller (1932) expressions of the pdf

The research on ratios of two jointly normal variables goes back to the 1930s when Geary (1930) formulated the analytical expression of the pdf of R for the particular case of µ = (0, 0). σ σ 1 ρ2 f (r) = x y − (2.2) R π(σ2r2 2ρσ σ r + σ2) x − p x y y where ρ is the correlation coefficient. Since then, several authors have provided an expression of the pdf of R without re- strictions on the means. The first to suggest a solution with µ being an arbitrary real vector was Fieller (1932). The solution in Fieller (1932) was re-expressed in Hinkley (1969):

b(r)d(r) b(r) b(r) fR(r) = Φ Φ 3 2 2 2πσyσxa (r) " { (1 ρ )a(r)} − {− (1 ρ )a(r)}# − − p 1 ρ2 p c p + − exp (2.3) πσ σ a2(r) {−2(1 ρ2)} px y − where:

r2 2ρr 1 a r ( ) = 2 + 2 sσy − σxσy σx µ r ρ(µ + µ r) µ b r y y x x ( ) = 2 + 2 σy − σxσy σx µ2 2ρµ µ µ2 c y x y x = 2 + 2 σy − σxσy σx b2(r) ca2(r) d(r) = exp − {2(1 ρ2)a2(r)} −

23 and Φ and exp are the cumulative distribution function (cdf) of the standard uni- {} {} variate normal and the exponential function, respectively.

Fieller (1932), Geary (1930) and Hinkley (1969) derive the pdf of R by performing the following change of variable:

X = X and Y = RX,

f (x, r) = φ (x, rx µ, Σ) x . X,R X,Y | | |

The marginal density fR(r) is calculated by integrating fX,R with respect to x. Notice that if µx = µy = 0, Eq. 2.3 reduces to Eq. 2.2. Furthermore, if ρ = 0 and σx = σy, Eq. 2.2 is the standard (Pham-Gia et al., 2006). The latter example shows how much the distribution of R can differ from a normal.

The expressions of the pdf of R in Fieller (1932) and Hinkley (1969) are complex and do not give much insight into its distributional shape. In 1965, Marsaglia (1965, 2006) expresses this pdf as a mixture of two heavy-tailed distributions. Marsaglia’s ex- pression provides more intuition about the shape of the distribution, which can differ greatly from the normal and can present skewness, bimodality and heavy tails.

2.2.2 Marsaglia (1965, 2006) expression of the pdf

The motivation problem for Marsaglia (1965) was a regression analysis to model the number of red cells (y) in blood against time (r), y = α + βr. The r-intercept (r = α/β) was used to estimate the red cell life span. Thus, the r-intercept dis- − tribution and its expected value were of medical interest to detect anomalies related to patients with the red cell life span shorter than expected. For modelling purposes, Marsaglia (1965) considered the r-intercept to be the ratio of two normal variables.

Marsaglia (1965, 2006) proved that R can always be transformed, by a translation a + U and a change of scale, into a ratio of the form T = , where U and V are b + V

24 independent standard normal variables and a and b are non-negative constants. Thus, a + U instead of R, it is sufficient to study the ratios of the form T = . The results in b + V Marsaglia (1965, 2006) are summarised in Theorem 1 and 2.

Theorem 1. Let (X,Y ) BVN(µ, Σ), Eq. 2.1. Then, s, w, a and b such that: ∼ ∃ Y Y sX a + U Y 1 a + U w s = w − + s X − X ∼ b + V ↔ X ∼ w b + V       where U and V are two independent standard normal variables. The values of w, s, a and b are given by

σx s = ρσy/σx w = (2.4) σ 1 ρ2 ∓ y − µy/σy ρµx/σx a = − b = µx/σpx (2.5) ∓ 1 ρ2 − Proof. (Sketch) Firstly, thep value of s (translation) is selected such that (Y sX) /X − is the ratio of two independent normal variables. Therefore,

E((Y sX)X) = E(Y sX)E(X) E(YX) sE(X2) = E(Y sX)E(X) − − ↔ − − ↔ 2 2 2 2 ρσy ρσxσy + µxµy s(σx + µx) = µxµy sµx ρσxσy sσx = 0 s = − − ↔ − ↔ σx Secondly, the value of w (change of scale) is chosen such that Y sX and X have the − standard deviations equal to 1. Thus, w = V ar(X)/ V ar(Y sX). Finally, the ± − choice of a and b is straightforward by consideringp that U pN(0, 1) a+U N(a, 1), ∼ ↔ ∼ where N denotes the univariate normal distribution. Consequently, a and b are chosen to be the expected value of (Y sX)/ V ar(Y sX) and X/ V ar(X), respectively. − − For more details on the proof, refer top Marsaglia (2006). p  a + U Theorem 2. The pdf of the ratio T = is a mixture of two heavy-tailed distri- b + V butions given by: f(t) = pf (t) + (1 p)f (t) (2.6) 1 − 2 where: q x2−q2 1 q 0 exp 2 dx f1(t) = and f2(t) = {− } π(1 + t2) π(1 + t2)(exp a2+b2 1) R { 2 } − with: (a2 + b2) b + at p = exp − and q = { 2 } √1 + t2

25 a + U Proof. (Sketch) The cdf of T = , F (), is related to the function of Nicholson b + V (Nicholson, 1943), denoted here by H() as follows:

1 1 bt a b + at F (t) = + tan−1(t) + 2H − , 2H(b, a) (2.7) 2 π √1 + t2 √1 + t2 −   where: q hx/q H(q, h) = ϕ(x)ϕ(y)dydx Z0 Z0 and ϕ the standard normal pdf. a + U Then, the pdf of T = is obtained by taking the derivative of F (t) (Eq. 2.7) with b + V respect to t. For more details, refer to Marsaglia (1965). 

The pdf f(t) (Eq. 2.6) can substantially differ from a normal. Notice that f1(t) is the Cauchy. Thus, the moments of f(t) do not exist. However, if b > 4, Marsaglia (2006) provides approximate values of the mean (µ) and the variance (σ2) for practical purposes:

µ = a/(1.01b 0.2713) σ2 = (a2 + 1)/(b2 + 0.108b 3.795) µ2 − − −

The pdf f(t) can be skewed or even bimodal depending on the values of a and b. Once the pdf of T is obtained, it is straightforward to calculate the pdf of R by performing the change of variable t = w(r s), which leads to f (r) = w f (w(r s)). An − R | | T − illustration of possible shapes of the pdf of R is shown in Figure 2.1.

26 Figure 2.1: Different shapes of the probability density function of the ratio of two jointly normal variables. [The parameters used are (µx, µy, σx, σy, ρ), (2, 38, 8, 24, 0.16) (left); (2, 20, 8, 24, 0.66) (middle) and (30, 31, 4, 5, 0.8) (right)]

2.2.3 Pham-Gia et al. (2006) expression of the pdf

The pdf of R has been studied for the last 80 years and it is still an active field of research. Recently, Pham-Gia et al. (2006) have provided a closed form of the pdf of R in terms of Hermite functions. The pdf is firstly formulated for ratios of independent normal variables (Theorem 3). Then, by using Theorem 1, it is possible to derive the pdf of the ratio of two dependent jointly normal variables (Theorem 4).

Theorem 3. Let (X,Y ) be a random vector of two independent normal variables with parameters as in Eq. 2.1 with σxy = 0. The pdf of R = Y/X is given by: C f r 1 H s r H s r R( ) = 2 2 2 [ −2( ( )) + −2( ( ))] (2.8) σy + r σx − where: σ σ 1 µ2 µ2 C = x y exp − y + x 1 π { 2 σ2 σ2 }  y x  2 2 1 σ µyr + σ µx s(r) = x y σxσy 2 2 2 2(σxr + σy) ∞ q 2 H− (z) = t exp t 2tz dt; a particular type of Hermite function 2 {− − } Z0

27 Proof. Firstly, the change of variable X = X and Y = RX is performed. Then, f (x, r) = x f (x)f (rx), where f and f are pdfs of univariate normal distribu- X,R | | X Y X Y tions. Secondly, f is reparametrized taking  = (2σ2)−1, η = µ /σ2,  = (2σ2)−1 X,R 1 x 1 − x x 2 y and η = µ /σ2, which yields the following expressions: 2 − y y 1 x2 2xµ + µ2  η2 f x x x 1  x2 xη 1 X ( ) = exp − 2 = exp 1 1 exp − πσ2 {− 2σ } π {− − } { 41 } 2 x x r 2 2 2 2 1 r x 2xrµy + µ  η f (rx) = p exp − y = 2 exp  r2x2 xrη exp − 2 Y πσ2 {− 2σ2 } π {− 2 − 2} { 4 } 2 y y r 2 2 2 √12 η1 η2 2 2 fX,R(x, r) = fpX (x)fY (rx) = x exp − + − exp (1 + 2r )x x(η1 + η2r) | | π { 41 42 } {− − } = x K exp ( +  r2)x2 x(η + η r) | | {− 1 2 − 1 2 } where: 2 2 √12 η η K = exp − 1 + − 2 π { 41 42 }

The next step is to integrate fX,R with respect to x.

∞ fR(r) = x fX (x)fY (rx)dx −∞ | | Z 0 ∞ = xfX (x)fY (rx)dx + xfX (x)fY (rx)dx −∞ − 0 Z ∞ Z ∞ = xfX ( x)fY ( rx)dx + xfX (x)fY (rx)dx 0 − − 0 Z ∞ Z 2 2 = xK exp (1 + 2r )x + x(η1 + η2r) dx 0 {− } Z ∞ + xK exp ( +  r2)x2 x(η + η r) dx {− 1 2 − 1 2 } Z0

2 Taking t = x (1 + 2r ), yields:

p ∞ t (η + η r) f (r) = K exp t2 + 2t 1 2 dt R 2 2 1 + 2r {− 2√ +  r } Z0 1 2 ∞ t (η + η r) + K exp t2 2t 1 2 dt 2 2 1 + 2r {− − 2√ +  r } Z0 1 2

Considering the definition of H−2(z), it becomes evident that:

K f r H s r H s r R( ) = 2 [ −2( ( )) + −2( ( ))] 1 + 2r −

28 with: 1 η + rη s(r) = 1 2 2 2 √1 + r 2

Substituting 1, 2, η1 and η2 by their expressions in terms of σx, σy, µx and µy, the result follows (Pham-Gia et al., 2006). 

Theorem 4. Let (X,Y ) BVN(µ, Σ), Eq. 2.1. The pdf of R=Y/X is given by: ∼

C2 f (r) = [H− (l(r)) + H− ( l(r))] R σ2r2 2ρσ σ r + σ2 2 2 − x − x y y where:

2 2 2 2 2 σ σ 1 ρ σ µ 2ρσxσyµxµy + µ σ C = x y − exp x y − x y 2 π {− 2(1 ρ2)σ2σ2 } p − x y 2 2 [ σ µyr + ρσyσx(µxr + µy) µxσ ] l(r) = − x − y 2σ2σ2(1 ρ2)(σ2r2 2ρσ σ r + σ2) x y − x − y x y q Proof. (Sketch) Pham-Gia et al. (2006) suggested to follow the same steps as in The- a + U orem 3. Alternatively, the proof can be done considering T = (see Theorem 1) b + V and its pdf given in Theorem 3:

1 a2 + b2 at + b at + b f t H H T ( ) = 2 exp −2( ) + −2( ) π(1 + t ) {− 2 } 2(t2 + 1) − 2(t2 + 1) ! By Theorem 1, R is distributed as T/w + s,p with w and s givenp in Eq. 2.4. Then, performing the change of variable t = (r s)w, the pdf of R is given by: − f (r) = w f [(r s)w] R | | T − 2 2 exp a +b a(r s)w + b a(r s)w + b w {− 2 } H H = 2 2 −2( − ) + −2( − ) | |π(1 + (r s) w ) 2((r s)2w2 + 1) − 2((r s)2w2 + 1) ! − − −

Substituting a, b, s and w by their expressionsp with respect to µx,pµy, σx, σy and ρ, the result follows. 

It can be shown that the pdf in Eq. 2.8 is equivalent to the one in Eq. 2.6 (see

Appendix C). Furthermore, Lemma 1 in Pham-Gia et al. (2006) proves that H−2(z) + 2 H− ( z) = F (1, 1/2, z ), where F (α, γ, z) is defined as: 2 − 1 1 ∞ (α, k) zk F (α, γ, z) = γ = 0, 1, 2,... (2.9) 1 (γ, k) k! 6 − − Xk=0 29 and (α, k) = α(α + 1) ... (α + k 1). − a + U C The advantage of expressing the pdf of T = as f (t) = 1 F (1, 1/2, s2(t)) b + V T 1 + t2 1 exp (a2 + b2)/2 at + b with C1 = {− } and s(t) = is that it allows one to calculate π 2(t2 + 1) analytically the first derivative of f (t) and study its sign for different positive values T p of a and b. By doing so, Pham-Gia et al. (2006) obtained the following results:

1. fT (t) always has a mode in ]0, a/b]

2. If a = 0, or b = 0 and a 1, f (t) is unimodal ≤ T

3. If b = 0 and a > 1, fT (t) is bimodal with symmetric modes.

4. If b > 0 and a > 0, fT (t) can be bimodal or unimodal. In the case of bimodality, the second mode belongs to ] , b/a[. − ∞ −

For further details, refer to Section 5 in Pham-Gia et al. (2006).

2.3 Normal approximation of the pdf of the ratio

The shape of the pdf of R can greatly differ from the normal in terms of skewness, bimodality or/and heavy tails (see Figure 2.1). However, if the coefficient of variation of the denominator (CVx = σx/µx) tends to zero, the cdf of R, F (r), can be approximated by the cdf of a normal (Fieller, 1932; Hinkley, 1969) as in Eq. 2.10.

µ r µ µ F (r) Φ x − y Φ − x (2.10) | − {σyσxa(r)}| ≤ { σx } with a(r) defined in Eq. 2.3. This is illustrated in Figure 2.2. However, the limit- ing value of CVx required for having a satisfactory normal approximation cannot be determined exactly because it depends on CVy and ρ (Shanmugalingam, 1982). For instance, Figure 2.3 displays two different shapes of the pdf of R with the same value of

CVx (0.13) and two different values of CVy (0.12 and 0.33). The ratio with CVy = 0.12 has an approximate normal shape whereas the ratio with CVy = 0.33 presents negative skewness.

30 Marsaglia (2006) provides a list of specific conditions for the normal approximation of the pdf of R. Marsaglia (2006) suggests that only when a < 2.25 and b > 4 (Eq. a + U 2.5), the ratios of the form T = are approximately normal (see Figure 2.1, c). b + V

The study in Pham-Gia et al. (2006) is focused on discerning the cases for which the pdf of R is unimodal or bimodal, rather than on investigating the goodness of its approximation by a normal distribution.

31 Figure 2.2: Probability density function of the ratio of two jointly normal variables for different values of the coefficient of variation of the denominator (CVx)

Figure 2.3: Probability density function of the ratio of two jointly normal variables for different values of the coefficient of variation of the numerator (CVy) but same CVx

32 2.4 Estimators of the ratio

2.4.1 Point estimators: average of ratios, ratio of averages

As previously stated, the Cauchy component in the distribution of R leads to the non-existence of E(Y/X) (Marsaglia, 1965, 2006). Instead, E(Y )/E(X) is usually considered for inference purposes (e.g. Lai et al., 2004). In particular, in agriculture, the two main estimators of E(Y )/E(X) are (Qiao et al., 2006):

1 n y Rˆ = i (2.11) A n x i=1 i X n n

RˆW = y/x = yi/ xi (2.12) i=1 i=1 X X

The estimator RˆA is greatly affected by atypical large values of yi/xi, produced when xi is close to zero. The considerable variation of RˆA in comparison to RˆW is illustrated in Figure 2.4, which shows the distribution of 60 bootstrap samples of size 100 for RˆA

(left) and RˆW (right). Each bootstrap sample was generated from a bivariate normal with µx = µy = 2, σx = σy = 1 and ρ = 0.7.

Figure 2.4: Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and Eq.

2.12 (right). [Sixty bootstrap samples of size 100 with µx = µy = 2, σx = σy = 1 and ρ = 0.7.]

33 Notice that if (xi, yi), i = 1 . . . n are i.i.d. according to BVN(µ, Σ), Eq. 2.1, RˆA and RˆW are the sum of ratios of jointly normal variables and the ratio of jointly normal variables, respectively. By Marsaglia (1965, 2006), the expected values and the vari- ances of RˆA and RˆW do not exist and no direct comparison of their bias and efficiency can be made. Although theoretically neither the expected value nor the variance of both estimators exist, for practical purposes, RˆW is suggested as less sensitive to values x 0 which also results in a less variable estimator in comparison to Rˆ (see Figure i → W 2.4 and Qiao et al. (2006)).

The performance of RˆA and RˆW to estimate E(Y )/E(X) when X and Y are in- dependent and normally distributed is studied in Qiao et al. (2006). Their numerical experiments showed that RˆA and RˆW can be used when CVX = σx/µx < 0.2 and σx/√n ˆ σx/√n CVX = < 0.2, respectively. The advantage of using RW is that CVX = µx µx 25σ2 n n > x CV depends on the sample size ( ). Thus, by choosing 2 , X can always be µx made less than 0.2 (Qiao et al., 2006). Similar numerical experiments to the ones in Qiao et al. (2006) could be carried out when ρ = 0, but for the best of our knowledge 6 this has not been done yet.

The estimation of ratios is also of interest in sample surveys. If it is reasonable to assume a linear relationship (Y = RX) between Y , the variable of interest, and X, an auxiliary variable, we can estimate the mean or the total of Y with greater precision by taking advantage of the correlation between both variables. The use of ratio estimators in survey problems is discussed in Chapter 6 of Cochran (1977). In particular, we refer to p. 175 for a discussion of the bias correction of RˆA when used to estimate E(Y )/E(X) and E(X) is known. However, the estimation of ratios in sample surveys is quite different to the objective of this thesis where we are interested in the estimation of the ratio per se rather than in Y .

34 2.4.2 Confidence sets of the ratio of expected values

In this section I review two equivalent procedures for calculating the confidence sets of E(Y )/E(X). The first procedure was derived by Fieller (1954) and is the most common among statisticians (Jarret 2012, pers. comm., December). The second one was derived in Von Luxburg & Franz (2009) and provides a geometrical insight to Fieller’s rule.

2.4.2.1 Fieller’s Theorem

Theorem 5. Let (X,Y ) BVN(µ, Σ), Eq. 2.1. The exact confidence set (S) of ∼ α = E(Y )/E(X) has three possible configurations: 1) completely unbounded confidence sets, 2) exclusive unbounded confidence sets and 3) bounded intervals (Fieller, 1954).

Proof. Here we develop the proof given in Casella & Berger (2002, p.464) in detail.

Let the paired variable (Xi,Yi), i = 1 . . . n, be independent and identically distributed (i.i.d.) according to BVN(µ, Σ), Eq. 2.1. Let us define θ =µ ˆ αµˆ . It is easy to y − x verify that θ is normally distributed with:

µy E(θ) =E(ˆµy) αE(ˆµx) = µy µx = 0 − − µx V ar(θ) =V ar(ˆµ ) + α2V ar(ˆµ ) 2αCov(ˆµ , µˆ ) = (σ2 + α2σ2 2ασ )/n y x − y x y x − xy where Cov denotes the covariance. By applying the two well-known results given by Eq. 2.13 and 2.14 (see Wackerly et al., 1996, p.294-297), we get Eq.2.15.

(n 1)Vˆ ar(θ) − χ2 (2.13) V ar(θ) ∼ n−1 N(0, 1) Tn−1 (2.14) χ2 /(n 1) ∼ n−1 − p

θ/ V ar(θ) θ µˆy αµˆx = = − Tn−1 2 2 2 ∼ [(n 1)Vˆ arp(θ)/V ar(θ)]/(n 1) Vˆ ar(θ) (ˆσy + α σˆx 2ασˆxy)/n − − − q q q (2.15)

35 2 where χn−1 and Tn−1 refer to a chi-square distribution and a Student’s t-distribution with n 1 degrees of freedom, respectively. − By the Eq. 2.15, the exact confidence set of α is calculated to satisfy Eq. 2.16

(ˆµ αµˆ )2 y − x t2 (2.16) (ˆσ2 + α2σˆ2 2ασˆ )/n ≤ n−1, 1−γ/2 y x − xy 2 2 where t is the 1 γ/2 quantile of T − . For convenience, we refer to t n−1, 1−γ/2 − n 1 n−1, 1−γ/2 as t2, further in the text. Inequality 2.16 can be rewritten as:

µˆ2 + α2µˆ2 2αµˆ µˆ t2(ˆσ2 + α2σˆ2 2ασˆ )/n 0 y x − x y − y x − xy ≤ (ˆµ2 t2σˆ2/n)α2 + ( 2ˆµ µˆ + 2t2σˆ /n)α + (µ2 t2σˆ2/n) 0 (2.17) x − x − x y xy y − y ≤

Assuming µ2 t2σ2/n = 0, the left hand side of Eq. 2.17 is a parabola (aα2 +bα+c = 0) x − x 6 with respect to α where:

a =ˆµ2 t2σˆ2/n x − x b =2( µˆ µˆ + t2σˆ /n) − x y xy c =ˆµ2 t2σˆ2/n (2.18) y − y

The solution of inequality 2.16 has four scenarios, one of which (number 3 in the list below) is not feasible for the values a, b and c in Eq. 2.18:

a < 0 •

1. If b2 4ac < 0; we obtain a completely unbounded interval, S = R − 2. If b2 4ac 0; we get an exclusive unbounded interval, − ≥ S =] , q ] [q , + [ − ∞ 1 ∪ 2 ∞

a > 0 •

3. If b2 4ac < 0; there is no real solution. − 4. If b2 4ac 0; we obtain a bounded interval, S = [q , q ] − ≥ 1 2

36 Figure 2.5: Feasible cases of Fieller’s confidence set of the ratio of expected values where:

2 2 2 2 2 2 2 2 2 (ˆµxµˆy t σˆxy/n) ( µˆxµˆy + t σˆxy/n) (ˆµy t σˆy/n)(ˆµx t σˆx/n) q = − ∓ − − − − 1,2 q µˆ2 t2σˆ2/n x − x It is straightforward to see that:

µˆ2 a > x > t2 0 2 ≡ σˆx/n b2 4ac 0 ( µˆ µˆ + t2σˆ /n)2 (ˆµ2 t2σˆ2/n)(ˆµ2 t2σˆ2/n) 0 − ≥ ≡ − x y xy − y − y x − x ≥ t4(ˆσ2 /n2 σˆ2σˆ2/n2) + t2( 2ˆµ µˆ σˆ /n +µ ˆ2σˆ2/n +µ ˆ2σˆ2/n) 0 ≡ xy − x y − x y xy y x x y ≥ t2(ˆσ2 /n2 σˆ2σˆ2/n2) + ( 2ˆµ µˆ σˆ /n +µ ˆ2σˆ2/n +µ ˆ2σˆ2/n) 0 ≡ xy − x y − x y xy y x x y ≥ 2 2 2 2 2 2 2 2 2ˆµxµˆyσˆxy/n +µ ˆ σˆ /n +µ ˆ σˆ /n 1 2ˆµxµˆyσˆxy +µ ˆ σˆ +µ ˆ σˆ − y x x y = [− y x x y ] t2 ≡ σˆ2σˆ2/n2 σˆ2 /n2 n σˆ2σˆ2 σˆ2 ≥ x y − xy x y − xy

Let us denote:

µˆ2 q x exclusive = 2 σˆx/n 2 2 2 2 1 2ˆµxµˆyσˆxy +µ ˆ σˆ +µ ˆ σˆ q = [− y x x y ] (2.19) complete n σˆ2σˆ2 σˆ2 x y − xy

The conditions presented in terms of a, b and c can be rewritten in terms of qcomplete

37 and qexclusive (Table 2.1).

Table 2.1: Conditions for the four scenarios of Fieller’s confidence sets

Conditions Fieller’s solutions

2 2 2 a < 0 and b 4ac < 0 qexclusive < t and qcomplete < t S = R − ≡ a < 0 and b2 4ac 0 q t2 > q S =] , q ] [q , + [ − ≥ ≡ complete ≥ exclusive − ∞ 1 ∪ 2 ∞ a > 0 and b2 4ac < 0 q > t2 > q Imaginary case (non-feasible) − ≡ exclusive complete a > 0 and b2 4ac 0 q t2 and q > t2 S = [q , q ] − ≥ ≡ complete ≥ exclusive 1 2

∗The values of a, b, c and qcomplete, qexclusive are defined in Eq. 2.18 and Eq. 2.19

Now we verify that the imaginary case is not feasible for the defined values of a, b and c or equivalently for the values of qcomplete and qexclusive. The proof is done by contradiction. The imaginary case implies that qexclusive > qcomplete or equivalently q q > 0. However, exclusive − complete 2 2 2 2 2 µˆ 1 2ˆµxµˆyσˆxy +µ ˆ σˆ +µ ˆ σˆ q q = x [− y x x y ] exclusive − complete σˆ2/n − n σˆ2σˆ2 σˆ2 x x y − xy 2 2 2 2 2 2 2 2 2 1 µˆ (ˆσ σˆ σˆ ) ( 2ˆµxµˆyσˆxy +µ ˆ σˆ +µ ˆ σˆ )ˆσ = [ x x y − xy − − y x x y x ] n σˆ2(ˆσ2σˆ2 σˆ2 ) x x y − xy 2 2 2 2 2 2 2 4 2 2 2 1 µˆ σˆ σˆ µˆ σˆ + 2ˆµxµˆyσˆxyσˆ µˆ σˆ µˆ σˆ σˆ = [ x x y − x xy x − y x − x x y ] n σˆ2(ˆσ2σˆ2 σˆ2 ) x x y − xy 1 (ˆµ σˆ2 µˆ σˆ )2 = [− y x − x xy ] 0 n σˆ2(ˆσ2σˆ2 σˆ2 ) ≤ x x y − xy which contradicts q q > 0 exclusive − complete

If a = 0, inequality 2.17 is of the form bα + c 0 with the solution given by: ≤

[ c/b, [ if b < 0 − ∞ ] , c/b] if b > 0 (2.20) − ∞ −

38 with b and c defined as in Eq. 2.18.

Finally, let us highlight that confidence sets of the ratio of expected values may be unbounded. This can never be inferred when treating the ratio as a normal variable and calculating the confidence intervals by the normal theory. Thus, such normal-theory intervals have low coverage3 even when the denominator is significantly different from zero and/or for large sample sizes (Franz, 2007).



2.4.2.2 Von Luxburg & Franz (2009) geometric solution

Recently, Von Luxburg & Franz (2009) have presented a geometric construction for confidence sets of E(Y )/E(X) which coincide with the ones in Fieller (1954). The intuitive idea in Von Luxburg & Franz (2009) is outlined as follows. Let (X,Y ) be 2 BVN(µ, Σ), Eq. 2.1. Let us define Lr = (x, y) R : y = rx the line of slope r { ∈ } which passes through the origin. If S = [l, u] is the confidence interval of E(Y )/E(X) at the confidence level 1 γ; then, we can construct a wedge (W ) given by the area − of the plane enclosed by Ll and Lu. The wedge comprises of all the lines Lr, such that r [l, u] (see Figure 2.6, left). On the other hand, we can construct S by ∈ intersecting W with the vertical line (x, y) R2 : x = 1 (see Figure 2.6, right). { ∈ } Thus, providing an appropriate W in R2 is equivalent to calculating the confidence set of E(Y )/E(X). Von Luxburg & Franz (2009) proves that such W is given by the area enclosed by the tangent lines to the ellipse (E) through the origin, where 2 ˆ −1 > 2 E = z R :(z µˆ)Σ (z µˆ) = t /n and t is the 1 γ/2 quantile of Tn−1. The { ∈ − − } − results in Von Luxburg & Franz (2009) are presented in Theorems 6 and 7.

Theorem 6. Let (X,Y ) be BVN(µ, Σ) distributed, Eq. 2.1. The exact confidence set for the ratio α = E(Y )/E(X) can be constructed in the following steps.

1. Estimate µˆx, µˆy and Σˆ

2. Take t the 1 γ/2 quantile of a T − − n 1 3The probability that the interval contains the true parameter

39 Figure 2.6: Construction of a wedge given the confidence interval of the ratio of means (left) and vice versa (right). [The lower and upper confidence limits are denoted by l and u, respectively.]

3. Plot E

4. Study the position of E

If (0, 0) / E, construct W by taking the area enclosed by the tangents to E • ∈ through the origin. Then, take S = W (x, y) R2 : x = 1 ∩ { ∈ } If (0, 0) E, take S = R • ∈ The procedure presented above yields the three different cases displayed in Figure 2.7.

2 Proof. (Sketch) Let us define πa : R R, (x, y) a1x+a2y an orthogonal projection → → on the line La2/a1 . The proof is in two steps. Firstly, it is demonstrated that πa(E) is a confidence interval of level 1 γ of π (µ), a. In particular, the projection of − a ∀ 2 x E on the line Lα = (x, y) R : y = , is a confidence interval of πα (µ). ⊥ { ∈ −α} ⊥ Secondly, it is proved that πα (µ) πα (E) α S. Then, it immediately follows ⊥ ∈ ⊥ ⇔ ∈ that 1 γ = P (πα (µ) πα (E)) = P (α S). For more details, refer to Theorem 1 − ⊥ ∈ ⊥ ∈ of Von Luxburg & Franz (2009). 

40 Figure 2.7: Confidence sets of the ratio of expected values in Von Luxburg & Franz (2009).

Notice that if n increases, the area of E decreases and then S shrinks. Furthermore, if E intersects the y-axis in only one point, then the y-axis is one of the tangents of E through the origin and S is given by the solution in Eq. 2.20.

Theorem 7. The confidence sets obtained by the geometric construction in Von Luxburg & Franz (2009) coincide with the ones in Fieller (1954).

Proof. (Sketch) The proof is in two parts. In the first one, it is demonstrated that the feasible cases of Fieller’s solutions (Table 2.1) are equivalent to the three cases derived in Von Luxburg & Franz (2009). The equivalence for the completely unbounded interval is demonstrated showing that:

−1 (0, 0) E (0 µˆ)Σˆ (0 µˆ)> t2/n q t2 ∈ ⇔ − − ≤ ⇔ complete ≤

The equivalences for the exclusive unbounded set and the bounded interval are demon- strated projecting E onto the x-axis. In the exclusive unbounded case, the y-axis is intersected by E. Thus, 0 belongs to the projection of E onto the x-axis given by [ˆµ t σˆ /n, µˆ +t σˆ /n]. Then, the extremes of the latter interval have the opposite x − x x x sign andp (ˆµ t σˆp/n)(ˆµ + t σˆ /n) < 0 q < t2. The same idea is used for x − x x x ⇔ exclusive p p 41 the bounded interval, in which case 0 is not contained in [ˆµ t σˆ /n, µˆ + t σˆ /n] x − x x x and then (ˆµ t σˆ /n)(ˆµ + t σˆ /n) 0 q > t2. Thep second partp of the x − x x x ≥ ⇔ exclusive proof shows thatpq1 and q2 arep equal to the slopes of the tangent lines to E through the origin. This can be demonstrated by using Eq. 3 in Walter et al. (2008). For more details on the proof, refer to Von Luxburg & Franz (2004). 

The study of the position of E is a quick procedure to determine whether X or Y are significantly different from zero and then which of the three situations in Section 2.1 applies. The key of this idea is that π (E) is a confidence interval of π (µ) a. a a ∀ In particular, for a1 = (1, 0) and a2 = (0, 1), which represent the projections onto the x axis and y axis, respectively. For instance, let us consider that E intersects − − the y axis. By projecting E onto the x-axis, it is clear that 0 π (µˆ) = [ˆµ − ∈ a1 x − t σˆx/n, µˆx + t σˆx/n]. Thus, µx is not significantly different from zero. The same can bep applied to thep numerator by projecting E onto the y-axis.

2.5 On the distributional properties of internal nitrogen use efficiency in rice.

In this section I applied the previously described theory to a real data set. The data set comprises 24 plot observations of NU and GY from non-irrigated rice in northeast Thailand (Figure 2.8) from the experiments carried out by Naklang et al. (2006). All the plots received a dose of fertiliser of 0 kg Nitrogen (N) ha−1, 21.8 kg Phosphorus (P) ha−1 and 41.5 kg Potassium (K) ha−1 and were wet after the flowering period of the rice.

Using this data set, I provide examples of the pdf of IEN , the bootstrap distribution of estimators of E(GY )/E(NU) and confidence intervals of E(GY )/E(NU) for this sample.

1. The pdf of IEN Let us assume that the true parameters of the population of non-irrigated rice under the fertiliser and water conditions described above coincide with the sample

42 Figure 2.8: Grain yield versus nitrogen uptake from a sample of non-irrigated rice in northeast Thailand (Naklang et al., 2006)

mean and variance-covariance (Eq. 2.1) given by:

416.86 10986 µ = (49.47, 2135.42); Σ = (2.21)  10986 609382.4    For these values of the parameters, the pdf of IEN (Figure 2.9) is clearly skewed to the right with tails heavier than the normal ones. The values of a and b (Eq.

2.5) are equal to 1.48 and 2.42, respectively. Thus, the distribution of IEN for this population does not have a satisfactory normal approximation (Section 2.3).

2. Bootstrap distribution of estimators of E(GY )/E(NU).

The arithmetic mean of the observations of IEN (RˆA, Eq. 2.11) is usually em- ployed to estimate E(GY )/E(NU). However, the bootstrap distribution of this estimator presents much more variation than the distribution of the ratio of the

arithmetic means of GY and NU (RˆW , Eq. 2.12). This fact is illustrated in Figure 2.10 which displays the bootstrap distribution of these estimators when each bootstrap sample is generated from a BVN with parameters given in Eq. 2.21.

3. Confidence interval of E(GY )/E(NU). The Fieller’s confidence set of E(GY )/E(NU) calculated from the sample in Figure 2.8 is a bounded interval

43 Figure 2.9: Probability density function of the ratio of two jointly normal variables with parameters given in Eq. 2.21

Figure 2.10: Bootstrap distribution of the estimators defined in Eq. 2.11 (left) and Eq. 2.12 (right) for the data in Fig. 2.8. [Sixty bootstrap sample of size 100 with parameters given in Eq. 2.21]

44 equal to [37.93, 49.45]. However, even if the Fieller’s confidence sets are bounded,

they can substantially differ to the ones obtained when assuming IEN having a normal distribution (in this case [39.93, 52.56]). Additionally, confidence intervals calculated according to the normal theory may have low coverage even when NU is far from zero (Franz, 2007).

2.6 Summary

Many of the issues presented in this chapter: mixture nature of the ratio distribution, heavy-tailedness, non-existence of moments and uninformative unbounded confidence sets – are not considered in agricultural published research when analysing ratios. In particular, most of the studies in our literature search (Section 1.4) which analysed

IEN , computed it for each experimental plot (IENi = GYi/NUi, where the subindex i refers to observations on the i-th plot) and applied univariate linear models on the IENi, i = 1 . . . n observations. Standard univariate linear models assume non-abnormality and homogeneity of variance for the error distribution. From the studies selected, only four (Albrizio et al., 2010; Giambalvo et al., 2010; Liu et al., 2011; T´etard-Joneset al., 2013) reported to have checked for the departure from normality, and three (Albrizio et al., 2010; Giambalvo et al., 2010; Liu et al., 2011) the homogeneity of error variances.

Bivariate analyses avoid dealing with the distributional issues of the ratio and main- tain the complete information on GY and NU and their joint behaviour, which shed more light on the utilisation of NU for GY . As discussed in Chapter 1, I proposed to use finite mixture models of bivariate Gaussian distributions as a complementary analysis to bivariate mixed models for the GY and NU field data collected across a range of environments. In the next chapter, I review the fundamentals of finite mix- ture models of multivariate Gaussian distributions with the objective of presenting the theoretical framework of the methodology of this thesis.

45 46 Chapter 3

Fundamentals of finite mixture models

3.1 Non-technical introduction

Finite mixture models are commonly used to classify observations from a heterogeneous population into subpopulations1 (e.g. Crossa & Franco, 2004; Di Zio et al., 2005; Pear- son, 1894). For a detailed explanation of the technique, refer to Fr¨uhwirth-Schnatter (2006); Lindsay (1995); McLachlan & Peel (2000); Titterington et al. (1985).

To classify observations into subpopulations, it is required to estimate a set of mixture parameters, ψ (Eq. 3.1), such as the mean, variance-covariance and the pro- portion of individuals for each subpopulation. The maximum likelihood method (see Wackerly et al., 1996, p.398-400) has been the most popular approach to estimating ψ (Fr¨uhwirth-Schnatter, 2006, p.49). This method is based on maximising the likeli- hood, or the log likelihood, function. The likelihood function is defined from the joint probability density function when a realisation of the random sample is given. The maximisation of the likelihood function will give us the values of the parameters which maximise the chance of having observed this sample. However, in the case of mixtures, the log likelihood function, L(ψ), can be complex and its global maximum cannot be calculated explicitly by analytical mathematical methods (Redner & Walker, 1984). Thus, the search for the global maximum is performed numerically via an algorithm called the Expectation and Maximisation (EM) algorithm (Dempster et al., 1977). The

EM algorithm is an iterative procedure which is initiated with a starting estimate (ψ0)

1In this thesis, clusters, groups and subpopulations are used indistinctly.

47 and then, at each iteration, searches a value of ψ which increases the value of L(ψ) in comparison to the previous iteration. The algorithm stops when a stopping criterion is fulfilled, e.g. when the difference in the value of L(ψ) between two consecutive iter- ations is smaller than a chosen threshold (Ng, 2013).

Fitting mixture models with the EM algorithm involves dealing with some chal- lengues. The difficulties are due to the nature of L(ψ) rather than to a failure of the algorithm (McLachlan & Peel, 2000, p.99). The main difficulties are listed as follows.

1. The likelihood function may be unbounded: thus, its global maximum does not exist (Kiefer & Wolfowitz, 1956). The values of ψ for which L(ψ) is unbounded are called singularities. The non-existence of the global maximum implies that a local maximum needs to be chosen as the maximum likelihood estimate (MLE) of ψ (Redner & Walker, 1984).

2. The likelihood function may have multiple local maxima (Seidel et al., 2000). This causes sensitivity of the algorithm to starting and stopping strategies (Seidel et al., 2000) and creates the dilemma of what local maximum to choose as the MLE (Figure 3.1).

3. Some local maxima do not produce meaningful cluster partitions. For instance, when the clusters are non-realistic for biological reasons (see Section 3.8.3) or a cluster is fitted to a ‘small localised random pattern’ (McLachlan & Peel, 2000, p.99), for example to few points which randomly lie on a line. Such local maxima are known as spuriosities (Figure 3.2).

The most common procedure to deal with the issues mentioned above is to decide on the maximum number of components to fit, and then to perform the following steps decreasing until just one component is fitted (Fraley & Raftery, 1998; McLachlan & Peel, 2000; Melnykov & Maitra, 2010; Ng, 2013):

1. Initiate the EM algorithm from different starting points to widen the search for local maximisers.

48 2. Stop the EM algorithm when the change in L(ψ) between two consecutive iter- ations is less than a chosen threshold.

3. Check for and remove spuriosities and singularities.

4. Select the local maximum with the highest value of L(ψ) from the remaining solutions.

Having an optimal solution for each number of clusters, another important decision is how to select the number of clusters. Several information criteria are commonly used for this purpose (Melnykov & Maitra, 2010). These criteria favour the models which fit the data well, while at the same time penalise for the model complexity.

A detailed explanation of finite mixture models is presented in Section 3.2 to Section 3.11 and references therein. In order to fully understand the rest of the chapter, the reader needs to have a basic knowledge of probability theory and statistics (see Samuels et al. (2012) for an introduction on the subject for biologists and Wackerly et al. (1996) as a second and deeper reading on the topic) and be familiar with the mathematical notations.

49 Figure 3.1: Solutions of three clusters obtained by the EM algorithm for the case study (Chapter 4)

Figure 3.2: Spurious solution obtained by the EM algorithm when fitting 7 components to the case study data (Chapter 4). [The green component corresponds to few points which randomly lie on a line and the dark blue and red ones correspond to groups with negative correlations and therefore are biologically meaningless]

50 3.2 Common use of mixture models

Mixture models are used in a large number of fields, such as agriculture, medicine, engineering, genetics, weather forecast and image segmentation and are applied for modelling very different types of data (Titterington et al., 1985, Table 2.1.3). Mixture models are used for two main purposes (Figueiredo & Jain, 2002; Fr¨uhwirth-Schnatter, 2006, p.5-6):

1. For cluster analysis where the researcher has some scientific justification in as- suming that the data come from different groups but there is little or no infor- mation on their classification (e.g. Crossa & Franco, 2004; Di Zio et al., 2005). Although, there are other clustering techniques, finite mixture modelling is the only one which: 1) assumes a well characterised mathematical model and 2) al- lows for parameter estimation by the maximum likelihood method and hypothesis testing on the number of clusters (McLachlan et al., 1996).

2. For approximating distributional shapes whose analytical expression is unknown (e.g. Marron & Wand, 1992; Xu et al., 2010). For instance, Figure 3.3 shows the histogram of a univariate random variable whose distribution is bimodal and asymmetric around the first mode. These types of non-standard distributions can be modelled well using mixture models.

In this thesis, finite mixture models are used for clustering purposes to identify groups in (NU, GY ) field data.

51 Figure 3.3: Bimodal distribution generated from a mixture of three univariate normal components. [The parameters employed to generate the sample were n = 100, π = (0.25, 0, 25, 0.50),

θ1 = (0, 1), θ2 = (3, 2) and θ3 = (7, 0.5) where θi = (µi, σi)]

3.3 Mathematical definition

Let Y = [Y1, Y2,..., Yn] be a random sample of n p-dimensional random vectors (1×p) Yj , j = 1 . . . n. Let y = [y1, y2,..., yn] be the observed values of the random sample Y. Finite mixture models assume that Yj are i.i.d according to the finite mixture density (f): g f(y ψ) = π f (y θ ) (3.1) j| i i j| i i=1 X where: g is the number of components. fi is the i-th component density.

θi is the parameter vector of fi. g π = (π , . . . , π ) denotes the mixing proportions, π 0 i and π = 1. 1 g i ≥ ∀ i i=1 X ψ = (π1, . . . , πg−1, θ1,..., θg, g) is the parameter vector of f.

For the bivariate mixture analysis of (NU, GY ), it is assumed that every mixture

52 component is a bivariate Gaussian with the pdf:

1 φ(y µ , Σ ) = exp (y µ )Σ−1(y µ )>/2 j| i i 2π Σ 1/2 {− j − i i j − i } | i| where:

µi is the mean vector of the i-th component density.

Σi is the variance-covariance matrix of the i-th component density.

3.4 Classifying data into groups: label random vectors and posterior probabilities

Mixture models assume that the information which classifies Yj into its group is miss- ing. Therefore, a label random variable Zj is introduced to classify Yj into g groups

(Dempster et al., 1977) . The random variable Zj models the probability of a random draw to have one of g possible outcomes. As a consequence, Zj has a multinomial random distribution (Mult) with parameters m = 1, where m is the number of draws, and π = (π1, . . . , πg) the probability of each outcome.

Z Mult(1, π) j ∼

For convenience, Zj is considered as a g-dimensional vector Zj = [Zij], i = 1 . . . g and j = 1 . . . n, defined as follows.

1, if Yj belongs to the i-th group (Gi), Zij =  0, otherwise.  

In this thesis, it is assumed that each group is generated by one mixture compo- nent. Under this assumption, πi is interpreted as the proportion of individuals in Gi or the probability of Yj to belong to Gi, i = 1 . . . g (McLachlan & Peel, 2000, p.7).

The component densities are viewed as the pdf of Yj conditional on the group of origin

53 f (y θ ) = f(y Z = 1, θ ) (McLachlan & Peel, 2000, p.7). Under this interpreta- i j| i j| ij i tion, it becomes easy to verify the mixture model density using the Total Probability Theorem:

g g g f(y ψ) = f(y , z = i ψ) = f(y z = i, θ )P (z = i) = f (y θ )π j| j j | j| j i j i j| i i i=1 i=1 i=1 X X X

The probability that Yj belongs to Gi, once yj has been observed, is called the posterior probability and denoted by τij.

f(y zij = 1)P (Zij = 1) πifi(y θi) τ = P r(z = 1 y ) = j| = j| (3.2) ij ij | j f(y ) g π f (y θ ) j i=1 i i j| i

The posterior probabilities are used to classify the data intoP groups by assigning yj to

Gi if τij = max τrj (McLachlan & Peel, 2000, p.31). The calculation of the posterior r=1...g{ } probabilities requires estimating the parameters of the mixture.

3.5 Maximum likelihood estimation of the mixture parame- ters

Since the first publications on finite mixture models (Newcomb, 1886; Pearson, 1894) several methods have been proposed for estimating the mixture parameters (see Everitt, 1996). The maximum likelihood method has been the most successful approach so far (Fr¨uhwirth-Schnatter, 2006, p.49). This method aims to find the global maximum of the likelihood function (Eq. 3.3) or the log likelihood function2 (Eq. 3.4). The natural way to proceed is to calculate the roots of the first derivative of L(ψ) and then, check which one is the global maximum. However, in the case of mixture models, L(ψ) is usually quite complex and the roots of the log likelihood equations cannot be calculated

2Notice that logarithm is a strictly monotone increasing function so it preserves the maximisers of the likelihood function.

54 explicitly (Redner & Walker, 1984).

n g

l(ψ) = πifi(yj; θi) (3.3) j=1 i=1 Y X

n g

L(ψ) = ln πifi(yj; θi) (3.4) j=1 i=1 ! X X

The estimation of the mixture parameters has become much easier due to the im- plementation of iterative algorithms and the development of fast computers. Special mention needs to be made to, probably, the most widespread algorithm: the Expec- tation Maximisation algorithm also known as the EM algorithm (Dempster et al., 1977).

3.6 The EM algorithm for the estimation of mixture param- eters

The EM algorithm is an iterative procedure which searches for the global maximum of L(ψ) in the framework of missing data. Although it is not clear who first developed the EM algorithm, it has been generally attributed to Dempster et al. (1977), who generalised the algorithm to an extensive number of applications and discerned for the first time the two well-known Expectation and Maximisation steps (Meng & Van Dyk, 1997). For a brief review of the history of the EM algorithm, refer to Meng & Van Dyk (1997); Redner & Walker (1984).

In the context of mixture models, the realisations of Zj, j = 1 . . . n (Dempster et al., 1977) are considered missing. Before explaining the algorithm computations, let us introduce all necessary notations. Let (Z1, Y1,..., Zn, Yn) be the complete random sample and (Y1,... Yn) the incomplete random sample with their realisations denoted by (z1, y1,..., zn, yn) and (y1,..., yn), respectively. The log likelihood function of the

55 complete random sample (Lc) is given by:

n L (z, y, ψ) = log f(z , y ψ) c j j| j=1 X The log likelihood function of the incomplete random sample (L) is given by:

n L(y, ψ) = log f(y ψ) (3.5) j| j=1 X

The EM algorithm approaches the problem of finding ψˆ , a local maximiser3 of

L(y, ψ), by applying a numerical procedure which aims to maximise Lc(z, y, ψ). How- ever, Lc(z, y, ψ) cannot be maximised directly because it depends on the values of zj, j = 1 . . . n, which are missing. Instead of this, Dempster et al. (1977) suggested maximising the expected value of Lc(z, y, ψ) given y, a realisation of the incomplete random sample, and ψˆ an estimate of ψ. Therefore, each iteration of the EM algorithm has two steps: Let ψ(r) be the estimate produced in the r-th iteration of the algorithm.

1. Compute E(L (z, y, ψ) ψ(r), y), denoted as Q(ψ ψ(r)) in the literature, (Expectation- c | | step or E-step).

2. Find ψ(r+1) that maximises Q(ψ ψ(r)) (Maximisation-step or M-step). | Dempster et al. (1977) proved that given an initial estimate ψ(0) and the values of the observed sample y, the EM algorithm produces a sequence ψ(0), ψ(1) ... ψ(s) with the property given in Theorem 8.

Theorem 8. The sequence L(ψ(r)) s where ψ(r+1) maximises Q(ψ ψ(r)) is non- { }r=1 | decreasing.

Proof. Theorem 1 in Dempster et al. (1977). A more detailed proof can be found in

Borman (2004).  3In the case of mixture models L(y, ψ) may be unbounded, thus the global maximum does not exist, see Section 3.17

56 The convergence properties of the EM algorithm are discussed in Wu (1983). Wu (1983) stated that if Q(ψ η) is continuous in ψ and η, then the algorithm converges | to a local/global maximum or a saddle point.

3.7 The EM algorithm for the estimation of parameters of mixtures of multivariate Gaussian distributions

In this section I specify the EM steps for mixtures of multivariate Gaussian distri- butions. A recommended reference addressing this topic is McLachlan & Peel (2000, Section 3.2-3.3). For details on the calculations of the EM steps, refer to Appendix D.

Let ψ(r) be the estimate at the r-th iteration of the algorithm.

1. E-step. The algorithm updates the posterior probabilities:

(r) (r) (r) (r+1) πi φ(y j µi , Σi ) τij = | (3.6) g π(r)φ(y µ(r), Σ(r)) i=1 i j| i i 2. M-step. The algorithm updatesP the estimates of the mixture parameters: n (r+1) (r+1) πi = τij n (3.7) j=1 . Xn n (r+1) (r+1) (r+1) µi = τij y j τij (3.8) j=1 . j=1 Xn X n Σ(r+1) = τ (r+1)(y µ(r+1))>(y µ(r+1)) τ (r+1) (3.9) i ij j − i j − i ij j=1 j=1 X . X 3.7.1 Illustration of the EM algorithm on simulated data

Let us consider a sample of 5 observations from a mixture of two bivariate normals with parameters:

> 5 1 1 π1 = 0.5 µ1 = Σ1 =  10   1 2      > 20 1 0 π2 = 0.5 µ2 = Σ2 = (3.10)  20   0 1      57 The data sample is displayed below in matrix M and plotted in Figure 3.4.

2.92 8.47  19.22 20.48    M =  19.67 20.21       5.38 9.16       19.12 19.61     

Figure 3.4: Scatter plot of a sample from a mixture of two bivariate normal with parameters given in Eq. 3.10. [The points in different clusters are plotted in different colours].

For the purpose of this illustration, the EM algorithm is initiated with the actual value of the parameters ψ(0) = ψ. The algorithm repeats the same operations in each iteration so I only present the calculations in the first iteration.

1. E-STEP (first iteration): (1) Let us calculate τ11 : 0.5φ ((2.92, 8.47) µ , Σ ) τ 1 = | 1 1 = 1 11 0.5φ ((2.92, 8.47) µ , Σ ) + 0.5φ ((2.92, 8.47) µ , Σ ) | 1 1 | 2 2 (1) The rest of τij , i = 1, 2 and j = 1 ... 5, are calculated similarly and presented in the matrix τ (1). The j -th row of τ (1) contains the posterior probabilities of the

58 (1) (1) observation yj to belong to the first cluster, τ1j , or to the second one, τ2j .

1 0  0 1  (1)   τ =  0 1       1 0       0 1      2. M-STEP (first iteration): (1) (1) (1) Let us compute π1 , µ1 and Σ1 by using Eq. 3.7, 3.8 and 3.9.

2 π(1) = = 0.4 1 5 1(2.92, 8.47) + 1(5.38, 9.16) µ(1) = = (4.15, 8.82) 1 2 [(2.92, 8.47) (4.15, 8.82)][(2.92, 8.47) (4.15, 8.82)]> Σ(1) = − − + 1 2 [(5.38, 9.16) (4.15, 8.82)][(5.38, 9.16) (4.15, 8.82)]> − − 2 1.51 0.42 =  0.42 0.12   

(1) (1) (1) The calculations of π2 , µ2 and Σ2 are performed using the same procedure.

3.8 Difficulties in selecting the MLE of mixture models of Gaussian distributions with heteroscedastic components

An important problem when fitting mixture models of Gaussian distributions with unrestricted covariance matrices is how to select the MLE of the mixture parameters (Everitt et al., 2011; Figueiredo & Jain, 2002; Melnykov, 2013; Seo & Kim, 2012). The difficulties encountered are not due to a failure of the EM algorithm but to the char- acteristics of L(ψ) for this type of mixture (McLachlan & Peel, 2000, p.99). The log likelihood function for mixtures of Gaussian distributions with heteroscedastic compo- nents presents:

59 1. Unboundedness, which leads to the non-existence of the global maximiser. The points where the log likelihood function is unbounded are called spuriosities. Then, a local maximiser is chosen as a MLE substitute (Redner & Walker, 1984).

2. Multiple maximisers, which creates the dilemma of which local maximum to choose as the MLE and produces sensitivity to the initial values of the parameters and stopping rules (Seidel et al., 2000).

3. Spuriosities, local maxima which do not correspond to a feasible cluster parti- tion, (McLachlan & Peel, 2000, p.99) and which require to be inspected.

3.8.1 Unboundedness of the likelihood function

Kiefer & Wolfowitz (1956) were the first who reported the unboundedness of L(ψ). They used the following mixture of univariate Gaussian distributions to illustrate this property:

π1 =0.5 π2 =0.5

µ1 =µ µ2 =µ µ unknown

σ1 =1 σ2 =σ σ unknown

The log likelihood function for this mixture is given by: n 1 (y µ)2 1 (y µ)2 L j j (ψ) = 1 exp − + 1 exp −2 2 {− 2 } 2 {− 2σ } j=1 2(2π) 2(2π) σ X 2 (yj µ) 1 If yj = µ and σ 0 then, exp − = 1 and 1 and L(ψ) is → {− 2σ } 2(2π) 2 σ → ∞ unbounded.

Similarly, in the case of multivariate Gaussian distributions, the unboundedness of L(ψ) (Eq. 3.11) is produced when y = µ and Σ 0. j i | i| → n g 1 −1 > L(ψ) = πi exp (y µ )Σ (y µ ) /2 (3.11) √ p 1/2 {− j − i i j − i } i=1 i=1 ( 2π) Σi X X | |

60 The values of the parameters at which L(ψ) becomes unbounded are called singu- larities. The unboundedness of the L(ψ) and therefore the non-existence of the MLE may seems puzzling at first. Nonetheless, and according to Redner & Walker (1984); McLachlan & Peel (2000, p.43), it is sufficient to find a local maximum of L(ψ) which satisfies certain properties such as efficiency4 and consistency5.

3.8.2 Multiple local maxima

The fact that L(ψ) may present multiple local maxima creates the dilemma of which one to choose as the MLE (Redner & Walker, 1984). Furthermore, even if we widen the search for local maximisers (see Section 3.9.1), we cannot be sure that all have been found (McLachlan & Peel, 2000, p.44). The presence of multiple maximisers results in sensitivity of the EM algorithm to the choice of the initial values of the parameters (seeds) and stopping rules. This fact was shown by Seidel et al. (2000), who aimed to approximate the distribution of the likelihood ratio test statistic, λ, by bootstrapping (see Section 3.11). Seidel et al. (2000) wanted to test if a sample came from a single exponential distribution with parameter θ or from a mixture of two with a parameter vector P. The bootstrap distribution of 2 log λˆ = 2 log L(Pˆ ) 2 log L(θˆ) depends on − − Pˆ , which is obtained by the EM algorithm. Seidel et al. (2000) showed on examples that by using different starting values and stopping rules for this algorithm the values of specific quantiles of 2 log λˆ can vary substantially between runs. The sensitivity to − the starting positions is also illustrated in Figure 3.1, which shows how two different seeds for the EM algorithm can lead to very different cluster partitions of the same data.

4An estimator θˆ is said to be efficient if it is unbiased (E(θˆ) = θ and V ar(θˆ)) is minimum (Wackerly et al., 1996, p.373) 5 An estimator θˆn is said to be consistent if by increasing the sample size n, it get closer to the parameter that we want to estimate. Formally, P ( θˆ θ < ) 1 (Wackerly et al., 1996, p.374) | n − | → 61 3.8.3 Spuriosities

There may be some local maxima which do not correspond to feasible cluster parti- tions. These solutions are called spuriosities. A solution is considered spurious if it is biologically meaningless, denoted here as a biological spuriosity, or if a component has been fitted to very few data points and the determinant of its covariance matrix is small but not zero (McLachlan & Peel, 2000, p.99), denoted here as a mathematical spuriosity.

Biological spuriosities in nitrogen efficiency Several biological constraints need to be considered in order to identify unfeasible clus- ter partitions. Firstly, the values of GY and NU are always positive and finite within a range which varies depending on the environmental conditions and the cultivar used. This information can be supplied by biologists. For example, for the case-study in Chapter 4, the values of GY and NU are within a range of 0 to 5000 kg/ha and 0 to 120 kg/ha, respectively. Notice that values close to zero, although possible, will indi- cate biological outliers, e.g. the crops in the plots do not perform well due to flooding. Secondly, the correlation of GY and NU is expected to be positive or zero. This is due to the fact that N is a major limiting factor for the grain production (Xu et al., 2012). Therefore, an increment in NU is expected to increase GY , resulting in a positive correlation between both variables. However, once a plant has accumulated enough N, GY becomes unresponsive to an increment in NU and no correlation between GY and NU is observed (see Figure 1.4, left). Situations when the amount of N in plants is excessive, arguably derived by an oversupply of N fertilisers, such that it adversely affects the synthesis of proteins as well as the growing pattern and health of plants to reduce grain (Goyal & Huffaker, 1984, p.111) are not common in sustainable agricul- ture. Thus, this scenario will not be considered in this work. Due to these biological constraints, cluster partitions which contemplate negative values for GY and NU or negative correlations between these two traits are considered spurious. For instance, the dark blue and red components in Figure 3.2.

62 In the case of applying the mixture approach to the univariate analysis of IEN , biological spuriosities are solutions which allow negative values of the ratio.

Mathematical spuriosities A mathematical spuriosity is a local maximum of L(ψ) (Eq. 3.5) for which one cluster has been allocated to a ‘small and localised random pattern’ (McLachlan & Peel, 2000, p.99). This cluster contains few data points and the determinant of the covariance matrix of its correspondent component is small but not zero (McLachlan & Peel, 2000, p.99). Equivalently, a procedure to detect spurious local maxima is to investigate the mixing proportion and the eigenvalues of the covariance matrix for each component (McLachlan & Peel, 2000, p.103). Notice that if one of the eigenvalues of the covariance matrix tends to zero, the Σˆ is expected to be small. Recall that if Σˆ is a real | i| i symmetric matrix, we can factorise Σˆ i as follows.

ˆ ˆ ˆ ˆ > Σi = QiDiQi (3.12)

ˆ ˆ ˆ where Qi is an orthogonal matrix with the eigenvectors of Σi, and Di is a diagonal > matrix with the eigenvalues denoted as (λˆ ,..., λˆ ). Thus, Σˆ = Qˆ Dˆ Qˆ . i1 ip | i| | i|| i|| i | As Qˆ is an orthogonal matrix, its determinant is equal to one. Therefore, Σ = i | i| λi1 . . . λip.

3.8.3.1 Visual methods to detect spuriosities

For bivariate Gaussian distributions, spuriosities can also be detected visually by plot- ting the prediction ellipses6 which depict the means and variance-covariance matrices of the mixture components (Eq. 3.13) (McLachlan & Peel, 2000, p.103). This visual method is in accordance to the ones suggested by Friendly (2006) for multivariate anal- ysis of variance. For instance, a mathematical spuriosity corresponds to a cluster whose prediction ellipse contains few observations and has one or two axes of small length.

−1 (n 1)p n + 1 (y µˆ )Σˆ (y µˆ )> = − F (1 γ, p, n p) (3.13) − i i − i n p n − − − 6The region given in Eq. 3.13 has 100(1 γ)% probability of containing a new observation of the − sample conditional on belonging to i-th group (see Chew, 1966). In the bivariate case p = 2.

63 where F (1 γ, p, n p) is the 1 γ quantile of the F distribution with p and n p − − − − degrees of freedom. The detection of spuriosities by the length of the axes is due to the fact that the length of the ellipse axes are proportional to the square root of eigenvalues of Σi. A comprehensive explanation of the latter fact can be found in Rencher (1998), ˆ ˆ ˆ ˆ > ˆ and I summarise it here. Considering that Σi = QiDiQi , and substituting Σi by its factorisation in Eq. 3.13, it follows that:

−1 > (n 1)p n + 1 (y µˆ )Qˆ Dˆ Qˆ (y µˆ )> = − F (1 γ, p, n p) − i i i i − i n p n − − > − Taking z = Qˆ (y µˆ )>, we arrive at: i i − i −1 (n 1)p n + 1 z>Dˆ z = − F (1 γ, p, n p) i i i n p n − − − 2 z2 (n 1)p n + 1 it = − F (1 γ, p, n p) (3.14) ˆ n p n − − t=1 λit X − The Eq. 3.14 is a canonical ellipse with the length of axes:

(n 1)p n + 1 − F (1 γ, p, n p)λˆit. (3.15) s n p n − − − The multiplication of a vector by an orthogonal matrix acts as a rotation of the vector.

Thus, Eq. 3.13 is an ellipse with its origin at µˆ i, the axes directions given by the eigenvectors of Σˆ i, and the axes lengths given by Eq. 3.15

For mixture models of univariate Gaussian components, spuriosities can be detected by plotting the pdf of each component. For instance, a mathematical spuriosity will present a component with small variance (e.g. McLachlan & Peel, 2000, p.100).

3.9 Strategy to select the MLE of mixtures with heteroscedas- tic Gaussian components

Selecting the MLE for mixtures of unrestricted covariance matrices is a difficult task due to the presence of singularities, multiple local maxima and spuriosities. Let us consider a mixture of g components. The most common strategy for selecting the MLE is to perform the following steps (Biernacki et al., 2006; McLachlan & Peel, 2000; Melnykov & Maitra, 2010; Ng, 2013):

64 1. Initiate the EM algorithm with different starting strategies to identify as many local maximisers as possible (see Section 3.9.1).

2. Stop the EM algorithm when a stopping criterion is fulfilled, e.g. the change in L(ψ) between two consecutive iterations is less than a chosen threshold.

3. Check for and delete spuriosities and singularities.

Then, the MLE is considered a solution which gives the highest value of L(ψ).

The logic of this procedure is based on the theoretical results obtained by Hath- away (1985), who constrained the parameter space to avoid singularities and spurious solutions for mixtures of univariate normal distributions. The constrained parameter space was: Ω = ψ Ω π , σ cσ 1 i = k g c > 0 c { ∈ | i ≥ i ≥ k ≤ 6 ≤ } Hathaway (1985) demonstrated that the constrained global maximiser was consistent,  given that ψ belongs to Ωc. Assuming that the same result applies for mixtures of multivariate Gaussian distributions, the same strategy could be performed (McLachlan & Peel, 2000, p.97). The constrained parameter space for the multivariate case was suggested by Hathaway (1985):

Ω = ψ Ω π , λ (Σ Σ−1) c, 1 i = k g c > 0 c { ∈ | i ≥ m i k ≥ ≤ 6 ≤ } where λm refers to the minimum eigenvalue of a matrix. However, satisfying these constraints involve a difficult optimisation problem.

Ingrassia & Rocci (2007) suggested a simpler approach. They observed that:

−1 λm(Σh) λm(ΣhΣj ) ≥ λs(Σj) where λs refers to the maximum eigenvalue of a matrix.

Then, by imposing the constraints: a λi(Σj) b, where λi(Σj) is the i-th eigenvalue a ≤ ≤ of the matrix Σ and taking c: j b ≥

−1 λm(Σh) a λm(ΣhΣj ) > c ≥ λs(Σj) ≥ b

65 which defines the constrained parameter space (Ωab) (Ingrassia & Rocci, 2007):

Ω = ψ π  a λ (Σ ) b, k = 1 . . . p and i = 1 . . . g ab { | i ≥ ≤ k i ≤ }

 This parameter space implies Ωc but results in a simpler optimisation problem.

The main difficulty is how to choose the tuning parameters c and  (univariate case) or a, b and  (multivariate case) to ensure that ψ belongs to the constrained parameter space (Hathaway, 1985). For this reason, some authors (e.g McLachlan & Peel, 2000; Melnykov, 2013) recommend not imposing constraints and deleting the singularities and spuriosities post-hoc. I have also decided to delete spuriosities and singularities rather than to restrict the domain of the parameters.

3.9.1 Starting strategies for the EM algorithm

Different strategies have been proposed for initiating the EM algorithm (e.g. Biernacki et al., 2003; Karlis & Xekalaki, 2003; Maitra, 2009); a comprehensive review is given in McLachlan & Peel (2000, p.54-55). However, none of them outperforms the others for all the applications (Meila & Heckerman, 2001; Melnykov, 2013). The starting strategies used in this current project are listed below.

1. Random starts. This strategy is based on constructing a random partition of the data. For each y , a random value, r, is generated from the set R = 1 . . . g . j { } Then, y is assigned to G by fixing τ = 1 and τ = 0 for s = 1 . . . g and s = r j r rj sj 6 (McLachlan & Peel, 2000, p.55).

2. Solution provided by the k-means algorithm. This procedure produces a partition of the data set based on minimising the distance between the obser- vations and the means of the clusters. The objective function to minimise is g d(y µ ) where d denotes the Euclidean distance (Forgy, 1965, as cited j − i i=1 y ∈G X Xj i in Omran et al., 2007) . Then, τij = 1 if yj is allocated to Gi by the k-means algorithm and τ = 0, s = 1 . . . g and s = i (McLachlan & Peel, 2000, p.54). sj 6 66 (0) 3. Simulated starts. The initial values of the joint means, µi , i = 1 . . . g, are n

generated from a bivariate normal distribution with mean y = yj/n and co- j=1 n X variance matrix S = (y y)>(y y)/n. The initial values of the mixing j − j − j=1 proportions and the covarianceX matrices of the components densities are given 1 by π(0) = and Σ(0) = S, respectively (McLachlan & Peel, 2000, p.55). i g i

4. Subsample solution. The EM algorithm is run from random starts on a sub- sample from the data sample. Then, the solution provided by the EM algorithm after few runs when applied on the subsample is used to initiate the EM algo- rithm on the entire sample. The subsample size needs to be big enough to avoid degenerate solutions (McLachlan & Peel, 2000, p.55).

5. Short run of the EM algorithm. This strategy is based on running several ‘short runs’ of the EM algorithm when the latter has been initiated with random starts. Then, the solution with the highest value of L(ψ) is used to perform a ‘long run’ on the EM algorithm (Biernacki et al., 2003, 2006). A ‘short run’ indicates that the threshold chosen to stop the EM algorithm is larger than that for a ‘long run’.

Strategies 1 and 2 are available in the R (R Core Team, 2012) package EMMIX (McLachlan et al., 1999), strategies 3, 4 and 5 have been programmed by the author (the code can be found in Appendices F and E).

3.10 Bayesian approach to estimating parameters of mixture models of multivariate Gaussian distributions

In this Section I review the Bayesian approach to estimating the parameters of mixture models of multivariate normal distributions. Refer to Lee(2012); Pe˜na(2002, Chapter 11) for an introduction to Bayesian statistics and Fr¨uhwirth-Schnatter (2006) for a detailed explanation of the Bayesian approach in the context of mixture models.

67 Bayesian statistics considers the parameters of a population to be random variables for which a pdf, called the prior distribution, is specified. The prior distribution is chosen depending on the previous knowledge of the researcher. Then, the researcher updates his/her knowledge once the data are recorded. The combination of the re- searcher’s previous knowledge and his/her learning from the data are expressed by the conditional pdf of the parameters on the observed data, the so-called posterior distribution. The prior and posterior distributions are related by Bayes’ Theorem.

f(Y ψ)p(ψ) p(ψ Y) = | l(ψ Y)p(ψ) (3.16) | f(Y ψ)p(ψ)dψ ∝ | | where: R ψ is the parameter vector. Y is the random sample. p(ψ Y) is the posterior pdf. | f(Y ψ) is the joint pdf. | p(ψ) is the prior pdf. l(ψ Y) is the likelihood function. |

Inference in Bayesian statistics is based on the posterior probability (Pe˜na,2002, p.329). For instance, ψ is estimated by taking the expected value of p(ψ Y), denoted | as E(ψ Y) (Pe˜na,2002, p.329). In the case of mixture models, E(ψ Y) always exists | | in closed form, given that priors are proper7 and conjugate8 (Diebolt & Robert, 1994). However, calculating E(ψ Y) requires expanding the likelihood function (Eq. 3.3), | which involves gn operations corresponding to all possible allocations of the data into g clusters (Lee et al., 2008). This is computationally prohibitive for large sample sizes (Lee et al., 2008). Thus, ψ is estimated by calculating the mean of a random sample generated from the posterior distribution (Diebolt & Robert, 1994). The most common algorithm to generate the random sample is the Gibbs sampler (Lee et al., 2008). The fundamentals of this algorithm, its application and issues associated with

7A proper prior is a prior which integrates to one (Pe˜na,2002, p.330) 8The prior distribution is called a conjugate prior if the posterior distribution belongs to the same parametric family as the prior (Pe˜na,2002, p.332)

68 the estimation of ψ are briefly revised below.

3.10.1 The Gibbs sampler

The Gibbs sampler (Geman & Geman, 1984) is a Markov chain Monte Carlo (MCM) method for generating a random sample from a joint pdf, e.g. fxy(x, y), by sampling from the conditional pdfs, f | (x y) and f | (y x). The steps of the Gibbs sampler in x y | y x | its bivariate case is detailed as follows (Robert & Casella, 2004, Chapter 9).

Let x(0) be an initial value of the random variable X. The following steps are repeated for r = 1 ...N, where r indicates the iteration number of the algorithm.

(r) (r−1) 1. Generate y from the distribution f | ( x ) y x ·|

(r) (r) 2. Generate x from the distribution f | ( y ) x y ·|

After n initial steps, n large and 1 n N,(x(n), y(n)) is a random value from the ≤ ≤ pdf fxy(x, y) (Diebolt & Robert, 1994).

3.10.2 The Gibbs sampler for a mixture of multivariate Gaus- sian distributions

(1) (N) (1) (N) The Gibbs sampler is used to generate (z ,..., z , ψg ,..., ψg ), a sample of size N from the posterior distribution p(Z, ψ Y) (Pe˜na, 2002, p.476). The subindex g is g| used to indicate that the number of mixture components are equal to g. The sample (1) (N) (1) (N) (1) (N) (z ,..., z , ψg ,..., ψg ) is obtained by generating (z ,..., z ), a sample from p(Z ψ , Y), and (ψ(1),..., ψ(N)), a sample from p(ψ Z, Y) (Pe˜na,2002, p.476). After | g g g g| n initial iterations of the algorithm, n being sufficiently large and 1 n N, ≤ ≤ (z(n),..., z(N), ψ(n),..., ψ(N)) is a random sample from p(Z, ψ Y) (Diebolt & Robert, g g g| 1994). In addition, (ψ(n),..., ψ(N)) is a random sample from p(ψ Y) (Robert & g g g| (n) (N) Casella, 2004, p.339). As pointed out by Diebolt & Robert (1994), (ψg ,..., ψg )

69 can be used to approximate the posterior expected value, E (ψ Y), as follows. p g| 1 N E (ψ Y) ψ(k) p g| ≈ N n g − k=Xn+1

The computation of posterior pdfs requires specifying the priors for the mixture parameters (see Eq. 3.16). Bensmail et al. (1997) suggested the use of conjugate priors:

π D(α , . . . , α ) ∼ 1 g µ MVN(ξ , Σ /k ) i ∼ i i i Σ−1 W (m , C ) i ∼ p i i where D refers to the Dirichlet distribution (Eq. 3.17), MVN to the multivariate normal and Wp to the Wishart distribution (Eq. 3.18).

α1 αg 1 f (x α . . . α ) = cx . . . x − (3.17) D | 1 g 1 g g Γ( g α ) where: x = 1 and c = i=1 i i g Γ(α ) i=1 i=1 i X P m p 1 − − m Q X 2 C exp tr(CX) fw(X m, C) = | | | | {− } (3.18) | Γp(m) with: p 2m + 1 k Γ (m) = πp(p−1)/4 Γ( − ) p 2 k=1 where Γ is the gammaY function, p is the dimension of the matrix Xp×p and tr is the trace function.

In particular, Bensmail et al. (1997) used αi = 1/g, ki = 1, mi = 5, ξi and Ci as the mean and covariance matrix of the entire sample, respectively. Under these priors, the Gibbs sampler for mixtures of multivariate normal distributions is set up as follows (Bensmail, 1997; McLachlan & Peel, 2000, p.123).

70 Let z(0) be an initial partition of the data into g clusters. The following steps are repeated for r iterations, r = 1 ...N.

1. Generate:

π(r) D(α + n(r−1), . . . , α + n(r−1)) ∼ 1 1 g g 1 µ(r) MVN(ξ∗(r−1), Σ(r−1)) i ∼ i (r−1) i ni + ki Σ−1(r) W (n(r−1) + m , C∗(r−1)) i ∼ p i i i

where: n

ni = zij j=1 X ∗ (niyi) + kiξi ξi = ni + ki ∗ −1 niki > −1 Ci = Ci + niSi + (yi ξi) (yi ξi) { ni + ki − − } 1 n y = z y i n ij j i j=1 X 1 n S = z (y y )>(y y ) i n ij j − i j − i i j=1 X

2. Generate:

Z(r) Mult (1, τ (r)) j ∼ g j

where Multg is a multinomial distribution with 1 trial, g outcomes and τ j the probability vector of the outcomes (Eq. 3.2).

(r) (r) (r) 3. Calculate ni , yi and Si

The estimation of the parameters by the Gibbs sampler involves dealing with an important pitfall, the so-called ‘label switching problem’ (Redner & Walker, 1984).

71 3.10.3 Label switching problem

A well-known mathematical feature of mixture models is its non-identifiability (Marin et al., 2005; McLachlan & Peel, 2000; Redner & Walker, 1984). A parametric family F is said to be identifiable, if given ψ1 and ψ2, two values of the parameter vector, and f(y ψ ), f(y ψ ) F such that, f(y ψ ) = f(y ψ ); then, ψ = ψ . The parametric | 1 | 2 ∈ | 1 | 2 1 2 family of mixture models is non-identifiable because the mixture density is invariant to a permutation on the parameters (Marin et al., 2005; McLachlan & Peel, 2000; Redner & Walker, 1984). This type of non-identifiability was denoted by Redner & Walker (1984) as the ‘label switching problem’. For instance, let us observe that:

π f (y θ )+π f (y θ ) = π f (y θ )+π f (y θ ), but (π , θ , π , θ ) = (π , θ , π , θ ) 1 1 | 1 2 2 | 2 2 2 | 2 1 1 | 1 1 1 2 2 6 2 2 1 1

The ‘label switching problem’ has serious implications for the estimation of ψ by the Bayesian approach. The invariance of the mixture to permutations results in a pos- terior distribution with g! modes and is thus, difficult to explore by the Gibbs sampler or other Monte Carlo methods (Lee et al., 2008). Furthermore, if there is no previous knowledge available for discerning the components of the mixture, it is common to take a prior invariant to a permutation of the parameters (Stephens, 2000). However, the latter practice results in posterior expected values of πi and θi equal for all the components (Marin et al., 2005; Stephens, 2000). A comprehensive explanation of this fact is given by Fr¨uhwirth-Schnatter (2006, p.64), which I present here.

Firstly, let us observe that the likelihood function is invariant to a permutation on the parameters. In the simple case of two components:

n n l(y ψ) = [π f (y θ ) + π f (y θ )] = [π f (y θ ) + π f (y θ )] | 1 1 | 1 2 2 | 2 2 2 | 2 1 1 | 1 i=1 i=1 Y Y Let us now consider g being a fixed number of components and η a permutation on the set G = 1, . . . , g . If we define ψ = (π , . . . , π , θ ,..., θ ) and take a prior { } η η(1) η(g) η(1) η(g) such that p(ψη) = p(ψ); then, it is easy to see that:

p(ψ y) = kl(y ψ)p(ψ) = kl(y ψ )p(ψ ) = p(ψ y) (3.19) | | | η η η| 72 Furthermore, the marginal posterior distributions are also invariant to a permutation on the parameters, p(θ y) = p(θ y). This is demonstrated as follows. i| η(i)|

p(θi y) = p(ψ y)d(θ1,..., θi−1, θi+1,..., θg, π1 . . . πg) (3.20) | g 1 g | ZΘ − ×[0,1] where Θ is the parameter space of θ , i = 1 . . . g. Let us transform the previous i ∀ integral by applying the permutation η. Taking into account that the determinant of the Jacobian matrix of a permutation is equal to one and that the parameter space does not change under a permutation:

p(θi y) = p(ψη y)d(θη(1),..., θη(i−1), θη(i+1),..., θη(g), πη(1) . . . πη(g)) | g 1 g | ZΘ − ×[0,1] = p(θ y) (3.21) η(i)|

The fact that p(θ y) = p(θ y) implies that E(θ y) = E(θ y). The latter equal- i| η(i)| i| η(i)| ity holds for all the permutations of the set G. Then, the estimates of the component parameters are identical for any data set which is an unsatisfactory result for the esti- mation problem (Marin et al., 2005).

Some authors imposed constraints on the parameter space to ensure the uniqueness of labelling (e.g. Lenk & DeSarbo, 2000). However, defining valid constraints can be a challenge when p > 1 (Fr¨uhwirth-Schnatter, 2006, p.20). Furthermore, these con- straints can interfere with the ability of the algorithm to explore the posterior surface or with the algorithm convergence (Lee et al., 2008). The latest efforts have been fo- cused on constructing relabelling algorithms which reorder the chain (ψ(n),..., ψ(N)) ‘a posteriori’ (e.g. Marin et al., 2005; Stephens, 2000).

Even if the approach of reordering the chain of estimates a ‘posteriori’ is able to solve the ‘label switching problem’, the Gibbs sampler, as other Monte Carlo methods, may be trapped in some local modes of the posterior distribution and then, become unable to reproduce the posterior surface (Lee et al., 2008). In addition, the Bayesian approach is computationally more costly than the frequentist procedure because it requires a

73 post-treatment of the Monte Carlo chain. Due to these difficulties, I have chosen to use the EM algorithm, which does not need to deal with the issue of exchangeable components (Jasra et al., 2005; McLachlan & Peel, 2000, p.27).

3.11 Selecting the number of mixture components

Selecting the number of components is a difficult statistical problem, which has been broadly studied with no unequivocal method outperforming the others for all the ap- plications (McLachlan & Peel, 2000, p.175). The main approaches are based on 1) information criteria, and 2) testing procedures (Melnykov & Maitra, 2010). Both al- ternatives are briefly described here and a full discussion can be found in McLachlan & Peel (2000, Chapter 6).

3.11.1 Information Criteria

Including more components in the mixture increases the value of L(ψ) but leads to more complex models (Figueiredo & Jain, 2002). Information criteria choose the model which provides the highest value L(ψ) while penalising for the complexity of the model. Therefore, all the information criteria have a generic expression:

2L(ψˆ ) + 2C (3.22) − where L(ψˆ ) is the value of the log likelihood function at ψˆ and C is a complexity penalty term.

There is a large number of information criteria. For instance, Akaike’s Information Criterion (AIC) (Akaike, 1973, 1974); Bayesian Information Criterion (BIC) (Schwarz, 1978); Integrated Classification Likelihood Criterion (ICL) (Biernacki et al., 2000); Bayesian Information criterion type approximation of the Integrated Classification Likelihood Criterion (ICL-BIC) (Biernacki et al., 2000); Normalized Entropy Criterion (NEC) (Celeux & Soromenho, 1996); Bootstrap-Based information Criterion (EIC) (Ishiguro et al., 1997); Cross-Validation-Based Information Criterion (CVIC) (Smyth,

74 2000); Informational Complexity Criterion (ICOMP) (Bozdogan, 1990, 1993); Approx- imate Weight of Evidence (AWE) (Banfield & Raftery, 1993) and Minimum Message Length Criterion for mixtures (L) (Figueiredo & Jain, 2002)

Independent on any information criterion used, the selection of the number of com- ponents is performed as follows (e.g. Fraley & Raftery, 1998; McLachlan & Peel, 2000, p. 219). Decide on the maximum number of components to fit. Then, decreasing the number of component to one, perform the following steps:

Choose ψˆ according to the procedure detailed in Section 3.9. The subindex g • g indicates that the mixture has g components.

Compute the information criterion chosen. • The number of components selected is the one which minimises the information crite- rion chosen.

In this thesis, the information criteria used to select the number of components were:

AIC(Akaike, 1973, 1974): • AIC = 2L(ψˆ ) + 2d −

BIC (Schwarz, 1978): • 2 log L(ψˆ ) + d log n − where d is the number of parameters in the mixture and n is the sample size. Fonseca & Cardoso (2007) showed that BIC had the best performance to select the number of clusters for mixtures of multivariate Gaussian distributions in an experiment carried out with 42 data sets with the EM algorithm.

The main inconvenience for all the information criteria is that they do not provide a measure of confidence for choosing a particular number of components (McLachlan

75 & Peel, 2000, p.184). Such measure can be obtained by applying bootstrap techniques on the likelihood ratio test statistic (McLachlan, 1987).

3.11.2 Likelihood ratio test for selecting the number of clus- ters

The selection of the number of components can be performed using the likelihood ratio test statistic (λ). The null hypothesis H0 : g = g0 is tested against the alternative one

HA : g = g0 + 1. sup L(ψ) L(ψˆ ) λ = θ ∈ H0 = g0 sup ∈ L(ψ) L ˆ θ HA (ψg0+1) H is rejected when λ is small or equivalently when 2 log λ is large. The statistic 0 − 2 log λ usually has an asymptotic distribution of a χ2, where d is the difference be- − d tween the number of parameters under HA and H0 (Wilks, 1938). This result assumes 9 that the parameter space of the null hypothesis (Θ0) is in the interior and an iden- tifiable subset of the parameter space (Θ)10, which is not fulfilled for mixture models (Ghosh & Sen, 1985; McLachlan & Peel, 2000, p.185-186).

Let us consider a simple example given in Ghosh & Sen (1985) which I detail here: Imagine we want to test:

H0 : f(y, θ0), with θ0 fixed, against H : πf (y, θ ) + (1 π)f(y, θ ) A 1 1 − 2

In this example, Θ and Θ0 are given by:

Θ = [0, 1] S S × 1 × 2 where S1 and S2 is the parameter space of θ1 and θ2, respectively.

Θ = ( 1 θ S ) ( 0 S θ ) ([0, 1] θ θ ) 0 { } × { 0} × 2 ∪ { } × 1 × { 0} ∪ × { 0} × { 0} 9Interior(S)=S-Boundary(S), being S a set 10Non-identifiable: The null hypothesis holds for two different values of the parameters belonging to the parameter space of the alternative hypothesis (McLachlan & Peel, 2000, p.185)

76 Note that Θ0 is on the boundary of Θ (Ghosh & Sen, 1985). Furthermore, if H0 is true, f(y, θ ) is represented by three densities of the form πf (y, θ ) + (1 π)f(y, θ ): 1) 0 1 1 − 2 π = 1 and θ = θ , 2) π = 0 and θ = θ and 3) π ]0, 1[ and θ = θ = θ (Ghosh 1 0 2 0 ∈ 1 2 0 & Sen, 1985). Thus, Θ0 is in a non-identifiable subset of Θ (Ghosh & Sen, 1985).

The breakdown of the regularity conditions implies that the distribution of 2 log λ − is in general unknown. McLachlan (1987) proposed to use bootstrap techniques to approximate this distribution as follows.

1. Apply the EM algorithm on the recorded data to find the MLE ˆ π ,..., π , ˆ ,..., ˆ ψg0 = (ˆ1 ˆg0 θ1 θg0 ) by the procedure in Section 3.9.

2. Generate B bootstrap samples of size n from the finite mixture model given by:

g0 yb ,..., yb πˆ f (y θˆ ), b = 1 ...B 1 n ∼ i i | i i=1 X

b b 3. Apply the EM algorithm to the bootstrap sample y1,..., yn with g0 and g1 =

g0 + 1 components and calculate: 2 log λ = 2 log L(ψˆ ) 2 log L(ψˆ ) , b = 1,...,B. − b − g1 b − g0 b

4. Order the values of 2 log λ from the smallest to the largest to obtain the order − b statistics.

5. Calculate the value of the likelihood ratio test statistic for the original sample λ L ˆ L ˆ k 2 log s = 2 log (ψg1 ) 2 log (ψg0 ) and compare it with the -th order − − − k statistic, ( 2 log λ ), k = 1 ...B, to get a significance level of α = 1 . − k − B + 1

6. The null hypothesis is rejected if 2 log λ is larger than 2 log λ . − s − k

The main disadvantages of the bootstrap approach are: a) its computational cost, b) its dependence to the strategies used to handle spurious cluster partitions, to ini- tiate and to stop the algorithm and c) it provides only an approximation of the true distribution of 2 log λ (McLachlan & Peel, 2000, p.193). − 77 3.12 Summary

In this chapter I have reviewed the fundamentals of the methodology of finite mixture models. Finite mixture models are used to model non-standard distributional shapes. For instance, Xu et al. (2010) used finite mixture models of univariate Gaussian distri- butions to model the pdf of abortion time from randomly selected dairy cows. Another important application is the classification of observations into groups. A very early ex- ample of this application can be found in Pearson (1894) who applied mixture models to discover subspecies among a population of blue crabs.

The modelling of pdf shapes and the classification of observations into groups re- quire estimating the mixture parameters. These estimates can be obtained by frequen- tist or Bayesian procedures. Under the frequentist approach, the estimation of the parameters is based on the maximum likelihood method and the application of the EM algorithm. The log likelihood function is multimodal so the EM algorithm needs to be initiated from different starting values to widen the search for local maximisers. Then, the local maximum with the highest value of the log likelihood, after the dele- tion of spuriosities and singularities, is chosen as the maximum likelihood estimate. The Bayesian approach requires applying the Gibbs sampler algorithm to generate a random sample from the posterior distribution of the parameters. Then, the estimates of the mixture are obtained by taking the mean of the random sample. In this the- sis, I use the frequentist approach because it is computationally less costly than the Bayesian one and does not require dealing with the problem of interchange of mixture components. The frequentist approach for fitting mixtures can be implemented with the R packages mixtools (Benaglia et al., 2009) and Mclust (Fraley & Raftery, 1999) (for mixtures of univariate Gaussian distributions) or EMMIX (McLachlan et al., 1999) (for mixtures of multivariate Gaussian distribution).

After having introduced all the necessary details of the technique, in the next chap- ter I apply the proposed methodology for clustering the (NU, GY ) field data collected

78 across environments in a real-life case study. The inspection of the mixture groups could reveal environmental conditions affecting (NU, GY ). By fitting mixture models one can estimate the expected value (mean) and the variance (degree of dispersion around the mean) of each NU and GY as well as their correlation (degree of associa- tion between both traits) for each group (environment). Furthermore, if the researcher still wants to estimate IEN , bivariate mixture models allow one to do so by taking the ratio of each of the estimated means and calculating confidence sets according to the procedures detailed in Chapter 2. Chapter 4 is written as a manuscript according to the guidelines of the Australian and New Zealand Journal of Statistics.

79 80 Chapter 4

Bivariate models for internal nitrogen use efficiency: mixture models as an exploratory tool

This chapter is presented according to the requirements of submission to the Australian and New Zealand Journal of Statistics. Statement of authorship and author contributions

Title of Paper Bivariate models for internal nitrogen use efficiency: mixture models as an explorative tool Publication status In preparation for submission

By signing the Statement of Authorship, each author certifies that their stated contribution to the publication is accurate and that permission is granted for the pub- lication to be included in the candidate’s thesis.

81 Name of the Princi- Isabel Munoz-Santa pal Author Contribution to the Originated the idea of the analysis, developed the methodology, undertook paper critical review of the relevant literature, analysed and interpreted the case-study data, developed the computational code, wrote and edited the manuscript and will act as corresponding author Signature and date

Name of Co-Author Petra Marschner Contribution to the Supervised the development of the work in its biological aspects, provided paper expertise for the interpretation of biological consequences of the analyses, con- tributed to the editing of the manuscript as appropriate Signature and date

Name of Co-Author S.M. Haefele Contribution to the Supplied the data set, provided expertise for the interpretation of biological paper consequences of the analysis, contributed to the editing of the manuscript as appropriate Signature and date

Name of Co-Author O. Kravchuk Contribution to the Trained the first author in interpreting statistical aspects as appropriate, con- paper tributed to the original discussion of the research ideas, contributed to the editing of the manuscript as appropriate Signature and date Aust. N. Z. J. Stat. 2014 doi: 10.1111/j.1467-842X.XXX

Bivariate models for internal nitrogen use efficiency: mixture models as an exploratory tool

I.MUNOZ-SANTA1 , P. MARSCHNER 2,S.M.HAEFELE 3,O.KRAVCHUK 1 University of Adelaide, Australian Centre for Plant Functional Genomics

Summary

Internal nitrogen use efficiency (IEN ) in cereals, defined as the ratio of grain yield (GY ) to nitrogen uptake (NU), is an important trait in agronomy and plant and soil science research. In this study we discuss the application of bivariate mixed and mixture models to the analysis of GY and NU field data and compare them to the univariate mixed and mixture models

for IEN . The bivariate analyses preserve the information on the GY and NU traits, and avoid dealing with the abnormalities issues of ratios. Bivariate mixture models are proposed as a classification tool for identifying field conditions affecting the utilisation of nitrogen. Due to the design constraints on the collection of data in agricultural field trials, the bivariate mixture technique is suggested as supplementary to bivariate mixed models. The mixture methodology is demonstrated on a case-study of rice research in northeast Thailand, for which the technique is proven useful for exploring environmental conditions, in particular, soil fertility and water availability. The bivariate mixture methodology may be applicable for other efficiency indices in agricultural research. Key words: bivariate analysis; classification and discrimination; cluster analysis; EM algorithm; internal nitrogen use efficiency, mixed model; mixtures

1. Introduction

Nitrogen (N) is an elementary constituent of the nucleotides and proteins of cereals (Xu et al. 2012), but cereal plant roots have available only a fraction of the N in soil. The availability of N depends on complex interactions between soil, plant and environmental

1 The University of Adelaide, School of Agriculture Food and Wine. Waite Building (Waite Campus), Waite Rd, Urrbrae SA 5064, Australia 2 The University of Adelaide, School of Agriculture Food and Wine. Prescott Building (Waite Campus), Glen Osmond SA 5064, Australia 3 Australian Centre for Plant Functional Genomics (Waite Campus), Hartley Grove, Urrbrae, SA 5064, Australia ∗Author for correspondence: Munoz-Santa, I., e-mail:[email protected], telephone:+61406815920 facsimile:+61(0)883137109

Acknowledgment. The authors acknowledge the Faculty of Science of the University of Adelaide for the Masters of Research scholarship for the first author and Paul Eckermann for his comments on the draft of the manuscript and help with ASReml-R.

c 2014 Australian Statistical Publishing Association Inc. Published by Wiley Publishing Asia Pty Ltd.

Prepared using anzsauth.cls [Version: 2014/01/06 v1.01] 2 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY factors (Marschner 2012, p. 315). Once N is absorbed by roots, it is transported to the rest of the plant, and a portion of this N is stored in the grain (Xu et al. 2012). To quantify the efficiency of the N utilisation by plants, several N efficiency indices have been defined. In the context of growing demand for cereals worldwide and limited agricultural land, a better understanding of N efficiency is a research priority among plant and soil scientists. Among several N efficiency indices, this study is focused on internal nitrogen use efficiency (IEN ). Internal nitrogen use efficiency is defined as the ratio of grain yield (GY ) to nitrogen uptake (NU), i.e. the content of N in the aboveground biomass. In agricultural research, this index expresses the ability of plants to utilise NU for grain production. However, how NU is utilised for grain is a complex process, governed by environmental factors, plant genetics and agronomic practices (Cassman et al. 2002). In agricultural field trials, GY and NU are measured at plot level at harvest. At this stage, a typical scatter of the GY and NU data for a particular cultivar grown under a range of conditions exhibits an increasing monotone linear-plateau shape (e.g. Witt et al. 1999; Naklang et al. 2006). However, this shape may result from an overlay of growth processes under the different conditions rather than from a direct functional response of GY to NU. As environmental conditions vary, the process of NU utilisation for GY changes, and so does the degree of association (correlation) between these two traits. Since a fraction of NU is utilised for seed formation (Xu et al. 2012), one would expect the correlation between NU and GY to be positive or zero. A negative correlation may occur when an excess of N in plants adversely affects the synthesis of proteins as well as plant health and growth pattern (Hauck 1984, p. 111). However, an excess of N, arguably caused by an oversupply of N fertiliser (Hauck 1984, p. 97), is not common in sustainable agriculture nowadays and will not be considered in this work. At harvest, conditional on major environmental factors and in the absence of a strong competition among plants, NU and GY at plot level can be seen as a cumulative effect of a large number of independent random variables with finite variances. By the Central Limit Theorem (see Cramer 1946, p. 219), each NU and GY is thus expected to follow a normal distribution. Expanding this result to the bivariate case (see Cramer 1946, p. 286), the joint distribution of (NU, GY ) is expected to be bivariate normal. Then, the probability density function of IEN is a mixture of two heavy-tailed distributions (Marsaglia 1965, 2006) and the shape of the IEN distribution can vary from normal-like to skewed or bimodal depending on the means and variances of NU, GY and their correlation (Marsaglia 1965, 2006). The effects of these parameters on the distributional shape of the ratio cannot be easily untangled. Despite their intrinsic abnormalities, ratios are commonly used among plant and soil scientists. In published research in agriculture, IEN is commonly computed for each experimental plot and analysed by univariate linear models (e.g. Peng et al. 1996; Delogu c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 3 et al. 1998; Fang et al. 2006; Naklang et al. 2006). However, univariate linear models on the ratio do not maintain information on the original traits, including their correlation, which limits the interpretatability of these models. A more adequate approach is bivariate analyses on (NU, GY ). Bivariate analyses preserve the information on the original traits thus, giving more insight into the mechanism of NU utilisation for GY . Furthermore, bivariate analyses avoid dealing with the abnormalities issues of the ratio. Among bivariate analyses, estimating and testing effects of treatments can be done algebraically with multivariate analyses of variance or numerically through residual maximum likelihood analyses. Evidence for the advantages of bivariate analyses over the univariate analyses on a ratio is provided by a recent paper by Ganesalingam et al. (2013) who analysed data on canola survival of blackleg disease in randomised complete block experiments. Their bivariate linear mixed model better utilised the experimental data, allowed greater flexibility in modelling spatial correlations and increased the accuracy of variety survival predictions.

At present, IEN data are predominantly collected in designed field trials where treatments are different fertiliser applications (e.g. Peng et al. 1996; Delogu et al. 1998; Fang et al. 2006; Naklang et al. 2006). However, as outlined earlier, the utilisation of NU for GY depends on nutrient availability, which can differ substantially from the amount of nutrients applied. Availability of nutrients, including N, depends on complex processes governed by many environmental factors such as local microbial activity, soil characteristics and water availability (Marschner 2012, p. 315). In the field, these factors are beyond the investigator’s control and may induce different levels of available nutrients even for plots receiving the same fertiliser treatment. The effect of environmental conditions may overcome, or interact or be confounded with, the treatments, complicating the interpretation of treatment-based analyses. In this study, we argue that such non-controlled conditions may lead to very different patterns of NU utilisation for GY even for the same treatment and thus to groups in the data different to treatment groups. Identifying such groups in data collected across different environments may complement treatment-based analyses when the objective is to gain insight into environmental drivers of NU and GY . Thus, it is important to investigate the benefits of finite mixture models of bivariate Gaussian distributions as a clustering technique for (NU, GY ) field data. Finite mixture models are a flexible statistical tool used in a large number of fields to model data sampled from different groups or to approximate unusual distributional shapes (Figueiredo & Jain 2002). In agriculture, finite mixture models have been used only occasionally. For instance, univariate mixture models have been used to model the distribution of abortion time in randomly selected dairy cows (Xu et al. 2010). Bivariate mixture models c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 4 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY have been employed to classify soybean genotypes with respect to seed yield and seed protein in randomised complete block designs (McLachlan & Basford 1988). To the best of the authors’ knowledge, however, finite mixture models of bivariate normal distributions have not been used for analysing NU and GY field data. The analysis of (NU,GY ) data by bivariate mixture models has three advantages: 1) avoidance of the abnormalities of the IEN ratio, 2) identification of groups in the presence of strong environmental factors and 3) ability to consider changes in the correlation between NU and GY . Furthermore, if the researcher still wants to estimate the ratio within each group, bivariate mixture models allow this through taking the ratio of the estimated means of GY and NU. The confidence set of the ratio of expectations can also be derived by straightforward calculations (Fieller 1954). However, inference with mixture models is a difficult task (Chen & Tan 2009), and we thus use the technique here for exploratory purposes only. This study aims to demonstrate the usefulness of the bivariate mixture methodology, as a complementary analysis to treatment-based analyses in designed field trials when the objective is to identify potential environmental factors driving GY and NU. We discuss the benefits of the bivariate mixture and mixed analyses in comparison with the univariate counterparts for IEN . The proposed methodology is applied to a particular designed field experiment on non-irrigated rice reported in Naklang et al. (2006). In that study the design of the field experiment was treatment-based. However, the objective of the study was to analyse

IEN across a range of environmental conditions without focusing exclusively on the actual treatments. This was intended to contribute to a better understanding of non-irrigated systems and to improve current management practices. This present paper is structured as follows. The fundamentals of finite mixture models are first reviewed. The case study is then described and analysed by both mixed and mixture models on IEN and (NU,GY ). The final section discusses the advantages of using bivariate models instead of the univariate models on IEN as well as the benefits of the mixture model methodology for identifying environmental conditions affecting (NU, GY ).

2. Finite mixture models of bivariate Gaussian distributions

2.1. Definitions and notations

(2 1) Let Yj × be a random vector, j = 1 . . . n where n is the sample size. Let yj be the observed value of the random vector Yj. Mixture models of bivariate Gaussian distributions consider the observations to be independent and identically distributed (i.i.d.) according to the finite mixture density: c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 5

g f(y ψ) = π φ(y µ , Σ ) (1) j| i j| i i i=1 X where g is the number of components, πi is the mixing proportion of the i-th component and φ is the bivariate normal density function. The parameters of the component densities are

µi, the joint mean, and Σi, the covariance matrix. The parameter vector of the mixture is

ψ = (π1, . . . , πg, µ1,..., µg, Σ1,..., Σg). Finite mixture models are commonly used for clustering data (Everitt 1996), in which case components correspond to groups. Clustering is achieved by assigning each observation to the group (G) with the maximum posterior probability:

πiφ(yj µi, Σi) τij = P (y Gi y ) = g | (2) j ∈ | j π φ(y µ , Σ ) i=1 i j| i i P The calculations of the posterior probabilities require estimating ψ. This is achieved via the Expectation and Maximization (EM) algorithm (Dempster et al. 1977), an iterative procedure for calculating maximum likelihood estimates in a framework of missing values. The fundamentals of this approach are briefly reviewed below.

2.2. The EM algorithm

The EM algorithm estimates the mixture parameters by interpreting the data as a missing data problem. The data are considered to be incomplete by introducing a g-random vector Zj, which matches Yj with its group of origin in the following way:

1, if Yj belongs to Gi; Zij =  i = 1 . . . g; j = 1 . . . n  0, otherwise.

 The Z(j) are i.i.d. according to a multinomial distribution with g possible outcomes. The i- th outcome occurs with probability πi. The realisations of Zj, denoted as zj, are treated as missing observations, so that the complete data are [z1,..., zn, y1,..., yn].

The log likelihood function (L) of the incomplete data [y1,..., yn] is mathematically intractable, and the root of the log likelihood equations (∂L/∂ψ = 0) cannot be calculated explicitly (Redner & Walker 1984). The EM algorithm approaches finding a local maximum of L by maximising the expectation of the log likelihood function of the complete data for given values of [y1,..., yn] and ψ. This results in a simpler optimisation problem with the advantage of producing, from a given initial estimate, ψ(0), a sequence of estimates, c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 6 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

ψ(r) s , such that L(ψ(0)) L(ψ(1)) ... L(ψ(s)) (Dempster et al. 1977). Under very { }r=0 ≤ ≤ weak conditions, the sequence ψ(r) s converges to a stationary point of L(ψ), which can { }r=0 be a local or global maximum, if the algorithm is not trapped in a saddle point (Wu 1983). For bivariate mixture models of Gaussian distributions (Eq.1), the sequence of estimates, ψ(r) s , is generated starting from a chosen initial estimate and iteratively { }r=0 performing the two EM steps, which are outlined below for the (r+1)-th iteration.

1. Update the posterior probabilities (E-step):

(r) (r) (r) π φ(y µ , Σ ) τ (r+1) = i j| i i ij g (r) (r) (r) π φ(y µ , Σ ) i=1 i j| i i P 2. Update the estimates of the mixture parameters (Eq.1) by using the new posterior probabilities (M-step):

n (r+1) (r+1) πi = τij n j=1 X . n n (r+1) (r+1) (r+1) µi = τij yj τij j=1 j=1 X . X n n (r+1) (r+1) (r+1) (r+1) (r+1) Σ = τ (y µ )(y µ )> τ i ij j − i j − i ij j=1 j=1 X . X The algorithm stops when the difference between the values of the log likelihood function at two consecutive iterations is smaller than a chosen threshold.

2.3. Fitting mixture models

Fitting mixtures of multivariate Gaussian distributions with unconstrained covariance matrices is a challenging statistical and computational task widely discussed in the statistical literature (e.g. Figueiredo & Jain 2002; Melnykov & Maitra 2010). The issues encountered are related to the nature of L(ψ) rather than to a failure of the EM algorithm (McLachlan & Peel 2000, p. 99). Firstly, L(ψ) is unbounded in the points (called singularities) where the estimated mean of one of the mixture components is equal to a yj and the determinant of the covariance matrix of this component tends to zero (Kiefer & Wolfowitz 1956; McLachlan & Peel 2000, p. 94). Thus, the global maximum of L(ψ) does not exist and a local maximum needs to be chosen as the maximum likelihood estimate (Redner & Walker 1984). Secondly, L(ψ) can present multiple local maxima, bringing the dilemma of which particular local c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 7 maximum to choose and causing sensitivity of the algorithm to starting and stopping rules (Seidel et al. 2000). Finally, L(ψ) can present a large local maximum when fitting a component to few data points whose covariance matrix has at least one eigenvalue that is very small. These solutions are known as spuriosities (McLachlan & Peel 2000, p. 99), and we refer to them as mathematical spuriosities. It is also possible that some solutions fall outside the biologically meaningful range of parameters. Instead of restricting the parameter space, one can additionally consider such solutions as biological spuriosities. A recommended strategy [see McLachlan & Peel(2000, p. 97), Melnykov & Maitra (2010) and Ng(2013)] for selecting the estimates of the mixture parameters is to decide on the meaningful maximum number of components to fit, and to perform the following steps, starting with fitting the maximum number of components and decreasing to just one component:

1. Initiate the EM algorithm from different starting values of the parameters ψ(0) or (0) group partitions zj .

2. Run the EM algorithm on the unrestricted parameter space until the difference between the log likelihood function of two consecutive iterations is less than a chosen threshold (c).

3. Examine and remove singularities and mathematical or biological spuriosities.

4. From the remaining solutions, select the one which gives the highest value of the log likelihood function.

Multiple starting strategies has been proposed to initiate the EM algorithm (e.g. Biernacki et al. 2003; Karlis & Xekalaki 2003; Maitra 2009) and a comprehensive review can be found in McLachlan & Peel(2000, p. 54-55). However, none of the starting strategies has been shown to outperform the others in all cases (Melnykov & Maitra 2010). For the purpose of our study, we have selected the following five strategies:

1. Random starts: the initial partition is constructed by randomly allocating each observation to one of the groups (McLachlan & Peel 2000, p. 55).

2. K-means: the initial partition is provided by the k-means algorithm (Forgy 1965 cited in Omran et al. 2007), which produces a group partition by minimising the distance from observations to the means of the groups (McLachlan & Peel 2000, p. 54).

3. Simulated means: the EM algorithm is initiated with g simulated means from a bivariate normal distribution with the mean and covariance matrix calculated from the c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 8 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

sample. The mixing proportion and covariance matrix for the i-th component are given (0) by πi = 1/g and Σi = S, where S is the entire data covariance matrix (McLachlan & Peel 2000, p. 55).

4. Subsample solution: the initial value of the parameters is the solution obtained by the EM algorithm when the latter is applied to a random subsample and initiated from random starts. The size of the subsample needs to be large enough not to produce degenerate estimates (McLachlan & Peel 2000, p. 55).

5. Short runs of the EM algorithm: the initial value of the parameters is the solution with the highest value of the likelihood function obtained after running several short runs of the EM algorithm when the latter is initiated from random starts (Biernacki et al. 2003).

The first two strategies are available in the package EMMIX (McLachlan et al. 1999). Strategies 3-5 were programmed by the first author in R (R Core Team 2012) (the code is available upon request). Another important decision is the choice of the number of groups. A common approach is to select the model which minimises some information criteria (McLachlan & Peel 2000, p. 184) such as the Bayesian Information Criterion, BIC (Schwarz 1978), or Akaike’s Information Criterion, AIC (Akaike 1973, 1974).

BIC = 2L(ψˆ ) + d log(n) − AIC = 2L(ψˆ ) + 2d − where L(ψˆ ) is the log likelihood value of the selected local maximum (ψˆ ); d is the number of parameters to estimate, in the case of mixtures of bivariate Gaussian distributions with unrestricted covariance matrices d = 6g 1 (Eq.1), and n is the sample size. − The methodology described above is easily translated to the case of mixtures of univariate Gaussian distributions. In this case, the parameters to estimate are the mixing 2 proportions, πi, the means, µi, and the variances, σi , i = 1 . . . g. For a review of mixture models of univariate Gaussian distributions, refer to Fruhwirth-Schnatter¨ (2006, p. 169-190).

2.4. Visual guides for mixture models

As a visual guide for mixtures of bivariate Gaussian distributions, the data are usually plotted in a scatter together with prediction/coverage ellipses of the mixture groups (e.g Figueiredo & Jain 2002; McLachlan & Peel 2000, p. 103). This visual method is in c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 9 accordance with those suggested by Friendly(2006) for multivariate analyses of variance. The ellipses depict the means, variances and correlations of each group and are thus useful for identifying both biological and mathematical spuriosities. For instance, ellipses with elongated or small circular shapes correspond to mathematical spuriosities (McLachlan & Peel 2000, p. 103). In the univariate case, the histogram of the data together with the probability density functions of the fitted components is commonly used for visualisation purposes (e.g. Benaglia et al. 2009). Spuriosities are detected by inspecting the component densities. For instance, components with small variances correspond to mathematical spuriosities (e.g. McLachlan & Peel 2000, p. 104).

3. Case study

The data set considered (Haefele, S.M., pers. comm., 2013) comprises 624 plot observations of NU and GY of non-irrigated rice from the field experiments reported in Naklang et al. (2006). The experiments were part of the field trials conducted by Wade et al. (1999). Neither Wade et al. (1999) nor Naklang et al. (2006) formulated their research questions exclusively in terms of treatment-based analyses; both also aimed to improve understanding of non-irrigated rice systems across different environmental conditions. As in our approach, Wade et al. (1999) performed a cluster analysis using the Ward’s agglomerative hierarchical algorithm (Delacy et al. 1996) to identify environmental groups. However, the cluster analysis was conducted exclusively on GY after removing the site and treatment effects. The field experiments were carried out at eight sites in the northeast of Thailand: Udon Thani, Sakhon Nakhon, Khon Kaen, Chum Phae, Tung Kula Ronghai, Phi Mai, Surin and Ubon Ratchathani (see Figure 1 in Naklang et al. (2006) for their locations). At each site, a completely randomised block design was implemented with three blocks and eight fertiliser treatments (Table1) applied on plots of 20 m2. The experimental layout was maintained over the wet seasons of 1995, 1996 and 1997. For each site and year, the soil water status was visually assessed every week and rated according to three categories: dry soil surface, wet soil surface or ponded water. However, only the dominant water conditions at pre-flowering, flowering and post-flowering were reported. This procedure resulted in the water conditions being completely confounded with the site.year effect in the design (Table2, some sites excluded as explained later). The experiment was repeated fully irrigated only at the Ubon Ratchathani site. Grain and straw yield samples were collected from an area of 8 m2 in the centre of each plot and analysed for N concentration. Grain yield was adjusted to a standard moisture c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 10 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY content of 14%. Nitrogen uptake was estimated by the following equation:

[N] S + m[N] GY NU = S G (kg/ha) 1000 where [N]S is N concentration in straw (g/kg), [N]G is N concentration in dry grain (g/kg), S is straw yield (kg/ha), GY is grain yield (kg/ha) and m is the moisture correction factor, equal to 0.86.

TABLE1 ABOUT HERE

TABLE2 ABOUT HERE

The observations from Chum Phae, Phi Mai and Tung Kula Ronghai were excluded for the following reasons. Chum Phae and Phi Mai presented large amount of missing values of

[N]G or [N]S. In Tung Kula Ronghai and unlike the remaining sites, rice was direct-seeded in 1996, which resulted in observations substantially different to those coming from seedlings. All the (NU, GY ) observations from the remaining sites over the three years were included and presented the typical linear-plateau scatter (Figure1).

FIGURE1 ABOUT HERE

4. Linear mixed model analyses

4.1. Univariate mixed model

Internal nitrogen use efficiency was analysed with a univariate mixed model in Genstat (VSN International 2012). At each site, the design considered was a strip plot with three blocks and treatments and years treated as the strip factors (Table3). Let s, t, y, b and n denote the number of sites, fertiliser treatments, years, blocks and the total number of observations, respectively. For this case study, s = 6, t = 8, y = 3, b = 3 and n = 432. (n 1) Let y × be the vector of IEN observations; y was modelled as:

y = Xτ + Zaua + Zbub + Zcuc + Zdud + Zeue + 

(k 1) where τ × is the vector of fixed effects containing the overall mean, year, treatment and (n k) treatment.year effects, k = 1 + y + t + yt; and X × is the corresponding design matrix. (s 1) (sb 1) (st 1) (sy 1) (sty 1) The vectors ua × , ub × , uc × , ud × , ue × are the site, site.block, site.treatment, (n s) (n sb) site.year and site.treatment.year random effects with design matrices Za × , Zb × , (n st) (n sy) (n sty) (n 1) Zc × , Zd × and Ze × , respectively. The vector  × is the vector containing the c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 11 plot errors. Site was considered a random term because the locations were randomly selected from northeast Thailand. The random effects and plot errors were assumed to be independent 2 2 and normally distributed with mean zero and var(ua) = σaIs, var(ub) = σb Isb, var(uc) = 2 2 2 2 σc Ist, var(ud) = σdIsy, var(ue) = σe Isty and var(u) = σ In, where var() denotes the covariance matrix and Ir the identity matrix of dimension r.

TABLE3 ABOUT HERE

In the univariate mixed model analysis of IEN , treatment was significant (p-value 0.01), whereas year and treatment.year were not (p-values of 0.45 and 0.86, respectively). ≤ Control and PK presented the largest means of IEN , and FYM NPK and ALL presented the lowest (Table4). The random term site.year and site.treatment.year had a very large variance component (Table5).

The IEN residuals did not violate the normality and homoscedasticity assumptions in any obvious way, except for some slight heavy-tailedness (Figure2).

TABLE4 ABOUT HERE

TABLE5 ABOUT HERE

FIGURE2 ABOUT HERE

4.2. Bivariate mixed model

The (NU, GY ) data were analysed by a bivariate mixed model conducted with ASReml-R (Butler et al. 2007). As in the univariate case, at each site, the design considered was a strip plot with three blocks and treatments and years treated as strip factors (Table3). (2n 1) Let y × = [yNU> , yGY> ]> be the vector containing the observations of NU and GY ; (2n 1) y × was modelled as:

Y = Xτ + Zaua + Zbub + Zcuc + Zdud + Zeue + 

(2k 1) where τ × is the vector of fixed effects containing the overall joint mean, year, treatment 2n 2k and treatment.year effects for both NU and GY , and X × is the corresponding design (2s 1) (2sb 1) (2st 1) (2sy 1) (2sty 1) matrix. The vectors ua × , ub × , uc × , ud × , ue × are site, site.block, site.treatment, site.year and site.treatment.year random effects for NU and GY with (2n 2s) (2n 2sb) (2n 2st) (2n 2sy) (2n 2sty) design matrices Za × , Zb × , Zc × , Zd × and Ze × , respectively.

The vector  = (ε1>,..., εn>)> contains the plot errors. As in the univariate analysis site c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 12 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY was considered a random term. The random effects and plot errors were assumed to be independent and normally distributed with mean zero and the following covariance matrices:

σ2 σ σ2 σ var( ) = a1 a12 I var( ) = b1 b12 I ua 2 s ub 2 sb " σa12 σa2 # ⊗ " σb12 σb2 # ⊗

σ2 σ σ2 σ var( ) = c1 c12 I var( ) = d1 d12 I uc 2 st ud 2 sy " σc12 σc2 # ⊗ " σd12 σd2 # ⊗

σ2 σ σ2 0 var( ) = e1 e12 I var( ) = 1 I ue 2 sty u 2 n " σe12 σe2 # ⊗ " 0 σ2 # ⊗ where denotes the Kronecker product, σr1 and σr2 are the ur variances for NU and GY , ⊗ respectively, and σr12 is the covariance of ur between NU and GY . In the bivariate mixed model analysis, treatment was significant (p-value 0.001 for ≤ GY and NU) whereas year and treatment.year were not (p-value of 0.77 and 0.12 for GY and NU for year, and 0.86 and 0.12 for GY and NU for treatment.year). Control and PK presented the lowest means for both NU and GY , and FYM NPK and ALL the largest (Table 4). A large variance component for site was observed indicating great heterogeneity for both NU and GY among sites (Table6). The high correlation between GY and NU (Table 6) in all the random effects is partially due to the fact that NU is a variable derived from GY . Thus, it is likely this correlation is spurious.

TABLE6 ABOUT HERE

4.3. Limitations of the linear mixed models

In agricultural research, field trials are designed to compare IEN across different fertiliser treatments. However, the utilisation of NU for GY depends on the levels of nutrients available in soil rather than the amount of nutrients applied. The availability of nutrients is conditioned by complex environmental interactions between climate, plants and soil and varies greatly with time and space (Marschner 2012, p. 136). This may result in very different levels of available nutrients, even in plots under the same fertiliser application and agronomy practice. Thus, such trials may present non-uniform (ill-defined) treatments. For instance, Control was clearly ill-defined in our case study. This is evident even though there was no direct measures on N available, as NU can be taken as a good surrogate indicator of N available in Control plots (Dobermann et al. 2003). Since there was considerable variation c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 13 of NU among the Control plots at different sites and in different years (see Figure1), considerable variation of available N was expected. Additionally, for this case study there was no design factor which could explain the effect of water. Furthermore, site and year created a range of different environmental circumstances but did not shed much light on the relationship between NU and GY .

5. Mixture model analyses

5.1. Univariate mixture model

For this case study, the residuals of IEN from the univariate linear mixed model did not violate the normality and homoscedasticity assumptions in any obvious way (Figure2). Thus, the probability density of the ratio can be approximated by a mixture of univariate Gaussian components. The univariate mixture analysis was performed using the R (R Core Team 2012) mixtools package (Benaglia et al. 2009), which implements the EM algorithm for fitting mixtures of univariate Gaussian distributions, and settings specified in Table7. The R(R Core Team 2012) code is available upon request.

TABLE7 ABOUT HERE

The information criteria AIC and BIC selected two as the optimal number of groups

(Table8). The first group included the plots with IEN between 24 and 44 and the second all the others. The second group had a larger mean and standard deviation than the first (Table 9).

TABLE8 ABOUT HERE

TABLE9 ABOUT HERE

The first group contained 70% of the plots which received FYM NPK (Figure3 a). For the other fertiliser treatments, the proportions of observations falling into the two groups did not differ as much as for FYM NPK (e.g. Figure3 b). Specifically, the proportions of observations classified in the first and second group for each fertiliser treatment are as follows: Control (42% vs 58%), PK (41% vs 59%), N (64% vs 36%), FYM (42% vs 58%), NPK (60% vs 40%), CR NPK (50% vs 50%) and ALL (66% vs 34%). The proportions of observations classified in the groups for each soil water status at post-flowering were: dry (45% vs 55%), wet (62% vs 38%) and ponded water (52% vs 48%) (Figure3 c).

FIGURE3 ABOUT HERE c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 14 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

5.2. Bivariate mixture model

The (NU, GY ) data were analysed using the R (R Core Team 2012) package EMMIX (McLachlan et al. 1999), which implements the EM algorithm for fitting mixtures of multivariate Gaussian distributions (the settings are specified in Table7). The R code is available upon request. The information criterion BIC selected three groups, whereas AIC selected five (Table 10). BIC has been shown to perform better than AIC (Fonseca & Cardoso 2007). Thus, three was chosen as the optimal number of groups. The first and second groups presented lower means of NU and GY than the third (Table 11). In terms of the estimated correlations, NU and GY were tightly correlated in the first group (Table 11). The mean of IEN of each group was estimated from the bivariate analysis by taking the ratio of the estimated joint mean, defined in Table 11 as IEˆ Ni. The confidence sets of IENi were derived by straightforward calculations according to the confidence rule in Fieller(1954) (Figure4).

TABLE 10 ABOUT HERE

TABLE 11 ABOUT HERE

FIGURE4 ABOUT HERE

It appears that the soil water status post-flowering and the soil N supply are the main factors (from the measurements recorded for the analysis) defining the mixture groups (Figure 5). Most of the plots (62%) classified in the first group did not receive any added N fertiliser (Control or PK). Soil N supply is the limiting factor for grain production (Xu et al. 2012), which explains the low means of NU and GY and the strong correlation between NU and

GY in this group (Table 11). The first group presented the largest mean of IEN (Table 11). This result was in agreement with those provided by the univariate linear mixed model analysis, in which Control and PK plots utilised N more efficiently (Table4). There were also plots with no N added in the second group but at a lower proportion (21%) than in the first (62%). The plots with dry soil (86%) or ponded water (73%) post-flowering were mostly classified in the first or second groups (Figure5). Plots with dry soil had lower values of NU and GY due to the fact that many physiological processes related to the uptake of nutrients are impaired under water stress (Tanguilig et al. 1987). The low GY in plots with ponded water post-flowering may have been due to the fact that rice remained green for longer, which affected grain filling period, translocation of N from green biomass into grain, and grain ripening (Ntanos & Koutroubas 2002). The plots in the third group were characterised by having N, P and K added as well as by being wet post-flowering – 71% of the plots with wet soil post-flowering were classified in the third group. This group had the largest mean GY . c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 15

FIGURE5 ABOUT HERE

5.3. Limitations of the mixture model approach

Finite mixture models assume independence of observations. However, this assumption may be violated in designed field trials. Recent developments in mixture models allow handling correlated observations by mixtures of linear mixed models (Ng 2013). Conditional on the mixture components, the observations on the experimental units are modelled by a mixed model, which provides a means to estimate correlations (Ng 2013). In our case, modelling the correlation will be a complex task. For instance, the correlation between plots depends on N availability, which is conditioned by environmental factors (spatial soil variability, microbial activity) and agronomic practices (re-application of fertiliser over the 3 years and the presence of straw residues or losses of nutrients from the previous harvest). Therefore, the extension of mixture models to include this type of correlation is challenging and may not even be possible. Broadly speaking, as pointed out by Ng et al. (2006), adapting clustering techniques to a wide variety of experimental design is an open research question.

6. Discussion

In published agricultural research, field trials are commonly designed to compare IEN across different fertiliser treatments and the analysis is often done by univariate linear models. However, environmental factors may cause plots under the same fertiliser treatment and agricultural practice to present different levels of available nutrients, resulting in a lack of consistency in treatment replications in field trials. Furthermore, univariate linear models of the ratio do not maintain the information on the original traits, and are often applied without checking the normality and homogeneity of error variance assumptions. Sampling across a range of environments may lead to different patterns (groups) of NU for GY in field data. In this study, we have investigated the use of bivariate mixture models for identifying groups. Once the groups are identified, their close investigation could reveal the underlying defining factors, which may not necessarily coincide with experimental treatments. The benefits of using bivariate mixture models in nitrogen efficiency research have been clearly demonstrated in our analysis of the case study on non-irrigated rice. Soil water status post-flowering has been revealed as an environmental factor defining the mixture groups. In terms of fertiliser treatments, both bivariate mixture and mixed models indicate that plots with no added N produced less grain and had shown less N uptake, which supports the fact that soil N supply is the limiting factor for grain production (Xu et al. 2012). c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 16 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

This study has shown several benefits of using bivariate analyses on NU and GY in comparison with their univariate counterparts for IEN . Firstly, bivariate analyses preserve information on NU and GY , which is lost when the data are analysed as a ratio. For instance, different levels of a factor may affect both NU and GY proportionally. Thus, even if NU and GY values change considerably, only minor changes may be observed in the ratio. This can be seen in our case study. The means of NU and GY increased considerably when CR NPK was added to the soil in comparison with plots under Control treatment (Table4). A difference of 16.31 kg/ha (SE = 4.17) was observed for NU and 630 kg/ha (SE = 127.9) for GY . However, the change in the ratio was minor, 3.09 (SE = 2.32). Similarly, the loss of information is illustrated in the univariate mixture analysis of IEN which fails to identify soil water status as a factor defining the mixture groups (Figure3 c). Secondly, bivariate analyses avoid dealing with the mixture and possible heavy-tailedness of the IEN distribution. The distribution of the ratio may violate the assumptions of normality and homogeneity of error variances leading to non-reliable inferences by linear mixed model. The departure from normality of the ratio distribution complicates the interpretation of components as physical groups in univariate mixture analyses with univariate Gaussian components. For instance, for the same physical environment, if the ratio distribution is skewed, the EM algorithm may detect more than one component in its attempt to approximate a non-Gaussian density. There are some limitations that need to be considered when applying mixture models to designed field trials. Mixture models assume that the observations are independent, but possible correlations may arise as a result of the experimental design. Thus, in our opinion, the application of bivariate mixture models in designed field trials should be used for exploratory purposes only, and complementary to bivariate mixed models. Furthermore, for this case study, utilising our approach most effectively would have required the recording of data on other potential environmental factors affecting nitrogen utilisation such as temperature, the presence of diseases or the indigenous levels of nutrients in the soil. Consequently, designs should be adopted which avoid correlated observations and are able to provide a more complete picture of the factors defining the mixture groups. Data collection through surveys is a more appropriate sampling procedure for the application of mixture models for clustering (e.g. Di Zio et al. 2005; Genge 2013). However, to the best of our knowledge, there is a lack of research on field survey design for efficient clustering with bivariate mixture models. Thus, how to best design a field survey for (GY , NU) to apply finite mixture models is an open research question. In conclusion, bivariate mixture models of Gaussian distributions are a useful exploratory tool for identifying potential environmental factors driving NU and GY and effectively complement bivariate mixed models in the analysis of designed field trials. Bivariate mixture models can also be used for analysing other similar bivariate traits in c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 17 agriculture or natural resource research but to fully exploit its potential they should be applied on designed field surveys.

References

AKAIKE, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki, editors, Second International Symposium on Information Theory. Akademia Kiado, Budapest. pp. 267–281. AKAIKE, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19, 716–723. BENAGLIA, T., CHAUVEAU,D.,HUNTER,D.R.&YOUNG, D.S. (2009). mixtools: An R package for analyzing finite mixture models. J. Stat. Softw. 32, 1–29. BIERNACKI,C.,CELEUX,G.&GOVAERT, G. (2003). Choosing starting values for the EM algorithm for getting the highest likehood in multivariate Gaussian mixture models. Comput. Statist. Data Anal. 41, 561–575. BUTLER,D.G.,CULLIS,B.R.,GILMOUR,A.R.&GOGEL, B. J. (2007). ASReml-R reference manual Queensland Deparment of Primary Industries and Fisheries, Australia CASSMAN,K.G.,DOBERMANN,A.&WALTERS, D.T. (2002). Agroecosystems, nitrogen- use efficiency, and nitrogen management. Ambio 31, 132–140. CHEN,J.&TAN, X. (2009) Inference for multivariate normal mixtures. J. Multivariate Anal., 100, 1367–1383. CHEW, V. (1966). Confidence, prediction, and tolerance regions for the multivariate normal distribution. J. Amer. Statist. Assoc. 61, 605–617. CRAMER, H. (1946). Mathematical Methods of Statistics. Princeton: Princeton University Press. DELACY,I.H.,BASFORD,K.E.,COOPER,M.,BULL,J.K.&MCLAREN, C.G. (1996) Analysis of multi-environment trials–and historical perspective In Cooper, M. & Hammer, G. L. (ed.) Plant adaptation and crop improvement. CAB International, 39– 124. Wallingford. DELOGU,G.,CATTIVELLI,L.,PECCHIONI,N.,DE FALCIS,D.,MAGGIORE, T. & STANCA, A.M. (1998). Uptake and agronomic efficiency of nitrogen in winter barley and winter wheat. Eur. J. Agron. 9, 11–20. DEMPSTER, A.P., LAIRD,N.M.&RUBIN, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–38. DI ZIO,M.,GUARNERA,U.&LUZI, O. (2005). Editing systematic unity measure errors through mixture modelling. Surv. Methodol. 31, 53–63. c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 18 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

DOBERMANN,A.,WITT,C.,ABDULRACHMAN,S. et al. (2003). Estimating indigenous nutrient supplies for site-specific nutrient management in irrigated rice. Agron. J. 95, 924–935. EVERITT, B.S. (1996). An introduction to finite mixture distributions. Stat. Methods Med. Res. 5 , 107–127. FANG,Q.,YU,Q.,WANG,E. et al. (2006). Soil nitrate accumulation, leaching and crop nitrogen use as influenced by fertilization and irrigation in an intensive wheat-maize double cropping system in the North China Plain. Plant Soil 284, 335–350. FIELLER, E.C. (1954). Some problems in interval estimation J. Roy. Statist. Soc. Ser. B 16, 175–185. FIGUEIREDO, M.A.T. & JAIN, A.K. (2002). Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 381–396. FONSECA,J.R.S.&CARDOSO, M.G.M.S. (2007). Mixture-model cluster analysis using information theoretical criteria. Intell. Data Anal. 11, 155–173. FRIENDLY, M. (2006). Data ellipses, HE plots and reduced-rank displays for multivariate linear models: SAS software and examples. J. Stat. Softw. 17, 1–43. FRUHWIRTH¨ -SCHNATTER, S. (2006). Finite mixture and Markov switching models Springer: New York. GANESALINGAM,A.,SMITH,A.B.,BEECK, C.P., COWLING, W.A., THOMPSON,R.& CULLIS, B.R. (2013). A bivariate mixed model approach for the analysis of plant survival data. Euphytica 190, 371–383. GENGE, E. (2013). A latent class analysis of the public attitude towards the euro adoption in Poland. Adv. Data Anal. Classif. (in press). HAUCK, R.D. (1984). Nitrogen in Crop Production. Madison: American Society of Agronomy- Crop Science Society of America- Soil Science Society of America. KARLIS,D.&XEKALAKI, E. (2003). Choosing initial values for the EM algorithm for finite mixtures. Comput. Statist. Data Anal. 41, 577–590. KIEFER J.&WOLFOWITZ J. (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist., 27, 887–906. MAITRA, R. (2009). Initializing partition-optimization algorithms. IEEE/ACM Trans. Comput. Biol. Bioinf. 6, 144–157. MARSAGLIA, G. (1965). Ratios of normal variables and ratios of sums of uniform variables. J. Amer. Statist. Assoc. 60, 193–204. MARSAGLIA, G. (2006). Ratios of normal variables. J. Stat. Softw. 16, 1–10. MARSCHNER,H.&MARSCHNER, P. (2012). Marschner’s mineral nutrition of higher plants. London : Elsevier Science. c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 19

MCLACHLAN,G.J.&BASFORD, K.E. (1988). Mixture models. Inference and applications to clustering. New York: Dekker. MCLACHLAN,G.J.&PEEL, D. (2000). Finite mixture models. New York: Wiley. MCLACHLAN,G.J.,PEEL,D.,BASFORD,K.E.&ADAMS, P. (1999). The EMMIX software for the fitting of mixtures of normal and t-components. J. Stat. Softw. 4. URL http://www.stat.ucla.edu/journals/jss. MELNYKOV, V. & MAITRA, R. (2010). Finite mixture models and model-based clustering. Stat. Surv. 4, 80–116. NAKLANG,K.,HARNPICHITVITAYA,D.,AMARANTE, S.T., WADE,L.J.&HAEFELE, S.M. (2006). Internal efficiency, nutrient uptake, and the relation to field water resources in rainfed lowland rice of northeast Thailand. Plant Soil 286, 193–208. NG, S.K. (2013). Recent developments in expectation-maximization methods for analyzing complex data. WIREs Comput Stat 5, 415–431. NG,S.K.,MCLACHLAN,G.J.,WANG,K.,JONES, L.B.T. & NG, S.W. (2006). A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22, 1745–1752. NTANOS,D.A.&KOUTROUBAS, S.D. (2002). Dry matter and N accumulation and translocation for Indica and Japonica rice under Mediterranean conditions. Field Crop. Res. 74, 93–101. OMRAN,M.G.H.,ENGELBRECHT, A.P. & SALMAN, A. (2007). An overview of clustering methods. Intell. Data Anal. 11, 583–605. PENG,S.,GARCIA, F.V., LAZA,R.C.,SANICO,A.L.,VISPERAS,R.M.&CASSMAN, K.G. (1996). Increased N-use efficiency using a chlorophyll meter on high-yielding irrigated rice. Field Crop. Res. 47, 243–252. RCORE TEAM (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www. R-project.org/. ISBN 3-900051-07-0. REDNER,R.A.&WALKER, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26, 195–239. SCHWARZ, G. (1978). Estimating the dimension of a model. Ann. Statist. 6, 461–464. SEIDEL, W., MOSLER,K.&ALKER, M. (2000). A cautionary note on likelihood ratio tests in mixture models. Ann. Inst. Statist. Math. 52, 481–487. TANGUILIG, V.C., YAMBAO,E.B.,O’TOOLE,J.C.&DE DATTA, S.K. (1987). Water stress effects on leaf elongation, leaf water potential, transpiration, and nutrient uptake of rice, maize, and soybean. Plant Soil 103, 155–168.

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 20 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

VSNINTERNATIONAL (2012). Genstat for Windows 15th Edition. VSN International, Hemel Hempstead UK. URL http://www.vsni.co.uk/. WADE,L.J.,AMARANTE, S.T., OLEA,A. et al. (1999). Nutrient requirements in rainfed lowland rice. Field Crop. Res. 64, 91–107. WITT,C.,DOBERMANN,A.,ABDULRACHMAN,S. et al. (1999). Internal nutrient efficiencies of irrigated lowland rice in tropical and subtropical Asia. Field Crop. Res. 63, 113–138. WU, C.F.J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11, 95–103. XU,G.,FAN,X.&MILLER, A.J. (2012). Plant nitrogen assimilation and use efficiency. Annu. Rev. Plant Biol. 63, 153–182. XU,L.,HANSON, T., BEDRICK,E.J.&RESTREPO, C. (2010). Hypothesis tests on mixture model components with applications in ecology and agriculture J. Agric. Biol. Environ. Stat. 15, 308–326.

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 21

TABLE 1 Fertiliser treatments applied in the experiments in Naklang et al.(2006)

Treatment Description Control No fertiliser applied 1 1 PK 0 N, 21.8 kg Phosphorus (P) ha− , and 41.5 kg Potassium (K) ha− 1 N 50 kg N ha− 1 FYM Farmyard manure (cattle manure) at 10 t ha− (fresh weight) 1 1 1 NPK 50 kg N ha− , 21.8 kg P ha− and 41.5 kg K ha− 1 1 1 CR NPK 50 kg N ha− (controlled-release), 21.8 kg P ha− , and 41.5 kg K ha− FYM NPK Combined application of the treatments FYM and NPK ALL NPK as in the NPK treatment + lime and trace elements

TABLE 2 Number of sites selected from the experiment in (Naklang et al. 2006) over the three years with soils reported to be dry, wet or having ponded water at three developmental stages of the plant

Soil status Plant developmental stages Pre-flowering Flowering Post-flowering Dry 0 1 6 Wet 8 3 8 Ponded water 10 14 4

TABLE 3 Diagram of the strip plot design for two blocks at one site for the experiment in the case study (Naklang et al. 2006)

NPK FYM PK ALL N CR NPK Control FYM NPK

1995

1996 Block 1

1997

ALL CR NPK Control PK NPK FYM FYM NPK N

1995

1996 Block 2

1997

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 22 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

TABLE 4

Estimated means (mean standard error) of internal nitrogen use efficiency (IEN ), nitrogen uptake (NU) and grain yield (GY ) for the fertiliser treatments described in Table1 across sites, years and blocks

Treatment Means of IEN Means of NU Means of GY Control 48.87 (2.39) 36.22 (7.34) 1677 (266.50) PK 48.69 (2.39) 35.53 (7.34) 1629 (266.50) N 42.71 (2.39) 52.94 (7.34) 2192 (266.50) FYM 46.29 (2.39) 49.66 (7.34) 2191 (266.50) NPK 43.16 (2.39) 54.51 (7.34) 2305 (266.50) CR NPK 45.78 (2.39) 52.53 (7.34) 2307 (266.50) FYM NPK 37.92 (2.39) 69.39 (7.34) 2552 (266.50) ALL 40.76 (2.39) 61.18 (7.34) 2445 (266.50)

TABLE 5 Estimated variance components (standard error) of the linear mixed model of internal nitrogen use efficiency

Random term Estimated variance component site 5.52 (14.18) site.block 3.18 (1.96) site.year 34.92 (18.09) site.treatment 1.57 (4.58) site.treatment.year 30.93 (7.48)

TABLE 6 Estimated variance covariance components (standard error) of the bivariate mixed model of nitrogen uptake (NU) and grain yield (GY )

Random term variances for NU variances for GY Correlation site 245.44 (176.43) 318 818.03 (243873.43) 0.96 (0.05) site.block 3.95 (2.86) 1481.32 (1895.60) 0.99 (NA) site.year 74.95 (39.69) 173 236.37 (81415.70) 0.76 (0.14) site.treatment 33.47 (12.90) 25 672.31 (12386.48) 0.93 (0.12) site.treatment.year 31.96 (9.76) 37 150.76 (12200.00) 0.92 (0.15)

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 23

TABLE 7 Settings and rationales for the mixture analyses

Setting Rationale Maximum number of components fixed at 6 Larger numbers resulted in frequent cases of spuriosities Starting strategy i-ii. Number of random and k- Ensure to have a good initial partition. However, means partitions set at 100 similar mixture estimates were observed when reducing this number Starting strategy iii. Not used for IEN analysis May produce negative initial values of the means of IEN . Biologically impossible Starting strategy iv. Subsamples of 200 observa- Subsample size sufficient to not produce degener- tions ate solutions Starting strategy v. A short run indicated that the When initiating the short run with a large number 2 threshold to stop the EM algorithm was c = 10− . of random starts, similar mixture estimates were 4 1 The number of short runs employed was three obtained with thresholds between [10− , 10− ] or larger number of short runs Stopping criteria. The threshold to stop the EM NA 6 algorithm was c = 10− . Standard setting Biological spuriosities. Components with negative Biologically impossible mean for IEN , and components with negative joint mean or correlation for (NU, GY )

TABLE 8 Starting strategy and AIC and BIC values for the solution with the highest value of the log likelihood function when fitting g = 1 ... 6 groups for the analysis of IEN . The minimum values of the information criteria are highlighted in bold.

Number of groups Starting strategy AIC BIC 6 K-means 3287.43 3356.60 5 K-means 3281.43 3338.39 4 subsample solution 3275.66 3320.42 3 K-means 3272.12 3304.67 2 short runs 3270.13 3290.47 1 random starts 3296.07 3304.20

TABLE 9 Parameter estimates (bootstrapped standard error) of the mixture model selected in Table8 for the analysis of internal nitrogen use efficiency (πˆi is the estimate of the mixing proportion, µˆi the estimate of the mean of the ratio and σˆi is the estimate of the standard deviation for the i-th group)

πˆi µˆi σˆi First group 0.45 (0.14) 38.36 (1.12) 6.14 (1.02)

Second group 0.55 (0.14) 49.01 (3.26) 11.60 (1.25)

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 24 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

TABLE 10 Starting strategy and AIC and BIC values of the solution with the highest value of the log likelihood function when fitting g = 1 ... 6 for the joint analysis of grain yield and nitrogen uptake. The minimum values of the information criteria are highlighted in bold.

Number of groups Starting strategy AIC BIC 6 random starts 10335.46 10477.85 5 subsample solution 10331.07 10449.06 4 short runs 10336.34 10429.91 3 simulated means 10333.03 10402.19 2 K-means 10363.44 10408.19 1 K-means 10444.21 10464.55

TABLE 11 Parameter estimates of the mixture model (bootstrapped standard error, below) selected by the BIC criterion (Table 10) for the joint analysis of grain yield and nitrogen uptake (πˆi is the estimate of the ˆ ˆ mixing proportion, µˆ i the estimate of the joint mean, IENi is the ratio of the estimated joint mean, Σi is the estimate of the covariance variance matrix and ρˆi is the estimate of the correlation for the i-th group)

ˆ ˆ πˆi µˆ i IENi Σi ρˆi 20.42 37.65 2607.11 First group 0.15 1070.22 52.41 2607.11 225208.08 0.90     42.43 118.25 3917.71 Second group 0.41 1866.97 44.00 3917.71 253491.45 0.72     70.66 322.31 5272.20 Third group 0.44 2814.10 39.82 5272.20 308423.50 0.52     2.09 15.30 1030.00 0.03 First group 157.00 3.53 1030.00 75700.00 0.05     2.27 24.00 1021.00 0.06 Second group 114.00 1.67 1021.00 69200.00 0.06     2.68 42.80 1540.00 Third group 0.05 100.00 1.27 1540.00 61300.00 0.09    

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 25

Figure 1. Typical trend of grain yield on nitrogen uptake from the experiments considered in Section3. The close and open circles represent plots with no fertiliser added and plots with a source of fertiliser, respectively.

Figure 2. The Q-Q plot for the residuals of internal nitrogen use efficiency versus the expected quantiles of a normal distribution (left) and the plot of residuals of internal nitrogen use efficiency versus estimated means (right).

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 26 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

Figure 3. Classification of the internal nitrogen use efficiency observations presented against the experimental sites for plots receiving a) FYM NPK , b) N and Control and c) all plots together with the water status post-flowering. The close and open symbols refer to observations classified in the first and second group, respectively.

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls I. MUNOZ-SANTA ET AL. 27

Figure 4. Mixture groups for grain yield and nitrogen uptake data considered in Section3. The black dots represent the estimated joint means and the ellipses the 90% prediction regions of the groups (Eq. 4.4 in Chew(1966)). The intervals are the confidence sets of the ratios of expectations of the groups calculated according to Fieller(1954).

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls 28 MIXTURES FOR INTERNAL NITROGEN USE EFFICIENCY

Figure 5. Mixture groups for the grain yield and nitrogen uptake data considered in Section3 together with a) the water status of the plots post-flowering and b) N fertiliser status. The ellipses are the 90% prediction regions of the groups (Eq. 4.4 in Chew(1966)).

c 2014 Australian Statistical Publishing Association Inc.

Prepared using anzsauth.cls Chapter 5

Conclusions and future lines of research

In this thesis I have proposed to perform bivariate mixed and mixture analyses on

(NU, GY ) instead of their univariate counterparts on IEN . Bivariate analyses main- tain the complete information on NU and GY , including their correlation, and avoid dealing with the distributional properties of the ratio.

The IEN data may violate the assumption of normality and homogeneity of error variances (Marsaglia, 1965, 2006) required in linear mixed models. As for the appli- cation of mixtures of univariate Gaussian components, the potential abnormality of the ratio distribution (Marsaglia, 1965, 2006) also complicates its analysis due to the following reasons. Firstly, the ratio may present heavy-tailedness; thus, its distribution may not be satisfactorily approximated by a mixture of univariate Gaussian compo- nents. Atypical observations of the ratio, produced when the denominator takes values close to zero, can affect the estimates of the univariate Gaussian components (Peel & McLachlan, 2000). Secondly, if in a physical groups the ratio distribution presents skewness or bimodality, more than one Gaussian component will be required to approx- imate a non-symmetric shape. This fact breaks the one-to-one correspondence between mixture components and physical groups and requires dealing with mixtures of mix- tures. Mixtures of mixtures arise when each component of the mixture is a mixture itself. That is:

g g ri f(y ψ) = π f (y θ ) = π α g (y ξ ) (5.1) j| i i j| i i ik ik j| ik i=1 i=1 ! X X Xk=1 where the gik can be assumed to be univariate normal. Mixtures of mixtures have

111 problems of non-identifiability (Di Zio et al., 2007). One of them is to choose which components of the second level of the mixture (gik) are used to model the non-Gaussian pdfs of the first level of the mixture (fi). Constraints on the parameters are needed to impose identifiability (see Willse & Boik, 1999)

Bivariate mixed models on (NU, GY ) are appropriate for the analysis of designed field trials, in which the researcher aims to test the performance of different experi- mental treatments. In our literature study of recent publications in nitrogen efficiency in cereals (Section 1.4), the experimental treatments mostly corresponded to different levels of nutrients added to the soil. However, the level of available nutrients (from indigenous as well as added sources) may substantially differ from the level of applied nutrients. For instance, strong precipitations after an application of nitrates can result in considerable losses of these nutrients through leaching or surface runoff (Raun & Johnson, 1999). Thus, in order to apply mixed models, special care of the design of such trials and a high control on the agronomic practices must be taken to ensure that treatment applications are as uniform as possible.

There are other types of studies where fertilisers are applied to create different fer- tility conditions rather than to test for their performance. This is the case of Naklang et al. (2006) whose research objectives were not formulated in terms of treatment-based analyses. Naklang et al. (2006) aimed to improve the understanding of how different environmental conditions affect IEN in rice. With this objective in mind several ex- periments across different sites and years were carried out to widen the range of soil characteristics and climatic and fertility conditions. In this thesis I have argued that different experimental and field conditions may lead to different patterns of the conver- sion of NU into GY , and thus to the presence of clusters in the field data. Such clusters can be identified by bivariate mixture models. The inspection of the mixture groups may reveal the environmental and fertility conditions defining them, assuming that the appropriate information has been recorded. Thus, mixture models are recommended for studies with similar objectives to the ones in Naklang et al. (2006).

112 The current methodology for mixture models assumes the data to be independent. However, NU and GY field data are commonly collected from designed field trials, with correlation between observations induced by the design. In order to reduce the correlation, one can apply mixture models on the observations adjusted for the spatial trend of the field. However, in our case study the information of the spatial layout of the designed field trial, required for modelling the spatial trends, was missing. Despite the potential violation of the independence assumption, mixture models were useful in the analysis of the case-study for the identification of different groups and allowed us to highlight the effect of N availability and soil water conditions at post-flowering on NU and GY .

The results obtained by applying mixture models can add extra information to the ones obtained by mixed models in designed field trials. For instance, if there are non-controlled factors which overcome, or interact with, designed treatments, the in- spection of the mixture groups may help to reveal them. On the other hand, if the fertiliser treatments are the main cause of having different patterns of (NU, GY ), one will expect these groups to be revealed by the mixture approach as well as to find significant differences when treatment contrasts are performed in the mixed models.

In conclusion, finite mixture models of bivariate Gaussian distributions are useful models for identifying potential experimental and environmental conditions segregat- ing the NU and GY data into clusters. In designed field trials and due to the potential violation of the independence assumption, we recommend using mixture models for exploratory purposes only and as a complement to mixed models. In order to fully exploit the potential of the technique, it is important to implement sampling proce- dures which avoid correlated observations, e.g. data collection from simple random field surveys.

A number of interesting questions have arisen during the development of this re-

113 search. One immediate future line of research is to carry out simulation studies to assess the coverage of the Fieller’s confidence intervals of E(GY )/E(NU) (Fieller, 1954) es- timated jointly for each mixture group (Figure 4, Chapter 4). Samples from mixtures with known parameters will be generated with the R package EMMIX (McLachlan et al., 1999). Then, the Fieller’s confidence intervals will be calculated and the per- centage of times that these intervals contain the true ratio of means for each group will provide a measure of the coverage of the intervals. In particular, it will be interesting to investigate how the coverage is affected by changing the area of intersection between the clusters.

A research gap identified during the development of this research is how to design field surveys with the aim of applying mixture models. Although mixture models have been widely applied in surveys (e.g. Di Zio et al., 2005; Genge, 2013), to the best of our knowledge and in agreement with Cressie (2014, pers. comm., 27 February), there are insufficient details in the literature on how to carry out surveys with post-stratification objectives. The simplest method would be to collect a large sample from the field in a simple randomised manner. However, it is not clear how large the sample size needs to be or how to collect the data to not introduce bias in the mixing proportions.

Not only the data collection protocol of the sample but also the way NU is mea- sured by biologists has been an issue identified in this research. Nitrogen uptake is a derived variable calculated from GY (Eq. 1.1). This induces a spurious correlation between both variables. This is clearly shown in Table 6 (Chapter 4), where the esti- mated correlation goes near to the boundary of the parameter space.

Finally, it would be interesting to further explore the topic of mixture of mixtures for modelling IEN . In particular, the identification of valid constraints on the mixture parameters to impose identifiability. Willse & Boik (1999) proposed to use constraints of the type µis = µi + θs. The µis is the mean of the s-th component of the second level of the mixture in the i-th component of the first level mixture. The µi is the

114 mean of the first component of the second level of the mixture in the i-th component of the first level of the mixture. The θs is the deviation between the means of the components of the second level of the mixture. Even with these constraints, we may (0) expect that different starting values of θs for initiating the EM algorithm will result in different components of the second level of the mixture modelling the non-Gaussian pdfs of the first level of the mixture. Alternatively, I suggest the use of mixtures of normal-skewed or t-skewed distributions (Fr¨uhwirth-Schnatter & Pyne, 2010) to model the distribution of the ratio. This is another topic that would be interesting to study in the future.

115

Appendix A

List of the studies in the review

C. Adhikari, K.F. Bronson, G.M. Panuallah, A.P. Regmi, P.K. Saha, A. Dobermann, D. C. Olk, P.R. Hobbs, and E. Pasuquin. On-farm soil N supply and N nutrition in the rice-wheat system of Nepal and Bangladesh. Field Crops Research, 64:273–286, 1999.

R. Albrizio, M. Todorovic, T. Matic, and A. M. Stellacci. Comparing the interactive effects of water and nitrogen on durum wheat and barley grown in a Mediterranean environment. Field Crops Research, 115:179–190, 2010.

W.K. Anderson and F.C. Hoyle. Nitrogen efficiency of wheat cultivars in a Mediter- ranean environment. Australian Journal of Experimental Agricultures, 39:957–965, 1999.

M.S. Aulakh, T.S. Khera, J.W. Doran, Kuldip-Singh, and Bijay-Singh. Yields and nitrogen dynamics in a rice-wheat system using green manure and inorganic fertilizer. Soil Science Society of America Journal, 64:1867–1876, 2000.

P. Belder, B.A.M. Bouman, R. Cabangon, L. Guoan, E.J.P. Quilang, L. Yuanhua, J.H.J. Spiertz, and T.P. Tuong. Effect of water-saving irrigation on rice yield and water use in typical lowland conditions in Asia. Agricultural Water Management, 65:193–210, 2004.

P. Belder, B.A.M. Bouman, J.H.J. Spiertz, S. Peng, A.R. Castaneda, and R.M. Vis- peras. Crop performance, nitrogen and water use in flooded and aerobic rice. Plant and Soil, 273:167–182, 2005.

117 Bijay-Singh., K.F. Bronson, Yavinder-Singh, T.S. Khera, and E. Pasuquin. Nitrogen-15 balance as affected by rice straw management in a rice-wheat rotation in northwest India. Nutrient Cycling in Agroecosystems, 59:227–237, 2001.

Bijay-Singh, Varinderpal-Singh, Yadvinder-Singh, H.S. Thind, A. Kumar, R.K. Gupta, A. Kaul, and M. Vashistha. Fixed-time adjustable dose site-specific fertilizer nitrogen management in transplanted irrigated rice (Oryza sativa L.) in South Asia. Field Crops Research, 126:63–69, 2012.

A.K. Borrell, A.L. Garside, S. Fukai, and D.J. Reid. Season, nitrogen rate, and plant type affect nitrogen uptake and nitrogen use efficiency in rice. Australian Journal of Agricultural Research, 49:829–843, 1998.

R.J. Cabangon, T. P. Tuong, E. G. Castillo, L. X. Bao, G. Lu, G. Wang, Y. Cui, B.A.M. Bouman, Y. Li, C. Chen, and J. Wang. Effect of irrigation method and N-fertilizer management on rice yield, water productivity and nutrient-use efficiencies in typical lowland rice conditions in China. Paddy and Water Environment, 2:195–206, 2004.

K.G. Cassman, M.J. Kropff, J. Gaunt, and S. Peng. Nitrogen use efficiency of rice reconsidered: What are the key constraints? Plant and Soil, 155-156:359–362, 1993.

K.G. Cassman, A. Dobermann, P.C. Sta Cruz, G.C. Gines, M.I. Samson, J.P. Descal- sota, J.M. Alcantara, M.A. Dizon, and D.C. Olk. Soil organic matter and the in- digenous nitrogen supply of intensive irrigated rice systems in the tropics. Plant and Soil, 182:267–278, 1996a.

K.G. Cassman, G.C. Gines, M.A. Dizon, M.I. Samson, and J.M. Alcantara. Nitrogen- use efficiency in tropical lowland rice systems: contributions from indigenous and applied nitrogen. Field Crops Research, 47:1–12, 1996b.

D. Chakraborty, R.N. Garg, R.K. Tomar, R. Singh, S.K. Sharma, R.K. Singh, S.M. Trivedi, R.B. Mittal, P.K. Sharma, and K.H. Kamble. Synthetic and organic mulching and nitrogen effect on winter wheat (Triticum aestivum l.) in a semi-arid environment. Agricultural Water Management, 97:738–748, 2010.

118 X. Chen, J. Zhou, X. Wang, A.M. Blackmer, and F. Zhang. Optimal rates of nitrogen fertilization for a winter wheat-corn cropping system in northern China. Communi- cations in Soil Science and Plant Analysis, 35:583–597, 2004.

L. Chuan, P. He, J. Jin, S. Li, C. Grant, X. Xu, S. Qiu, S. Zhao, and W. Zhou. Estimating nutrient uptake requirements for wheat in China. Field Crops Research, 146:96–104, 2013.

J.M. Clarke, C.A. Campbell, H.W. Cutforth, R.M. DePauw, and G.E. Winkleman. Nitrogen and phosphorus uptake, translocation, and utilization efficiency of wheat in relation to environment and cultivar yield and protein levels. Canadian Journal of Plant Science, 70:965–977, 1990.

M.K. Conyers, C. Tang, G.J. Poile, D.L. Liu, D. Chen, and Z. Nuruzzaman. A combina- tion of biological activity and the nitrate form of nitrogen can be used to ameliorate subsurface soil acidity under dryland wheat farming. Plant and Soil, 348:155–166, 2011.

C. M. Cossani, C. Thabet, H. J. Mellouli, and G.A. Slafer. Improving wheat yields through N fertilization in Mediterranean Tunisia. Experimental Agriculture, 47:459– 475, 2011.

C.M. Cossani, G.A. Slafer, and R. Savin. Nitrogen and water use efficiencies of wheat and barley under a Mediterranean environment in Catalonia. Field Crops Research, 128:109–118, 2012.

Z. Cui, F. Zhang, X. Chen, Y. Miao, J. Li, L. Shi, J. Xu, Y. Ye, C. Liu, Z. Yang, Q. Zhang, S. Huang, and D. Bao. On-farm evaluation of an in-season nitrogen management strategy based on soil N min test. Field Crops Research, 105:48–55, 2008.

Z. Cui, F. Zhang, X. Chen, F. Li, and Y. Tong. Using in-season nitrogen manage- ment and wheat cultivars to improve nitrogen use efficiency. Soil Science Society of America Journal, 75:976–983, 2011.

119 X.Q. Dai, H.Y. Zhang, J.H.J. Spiertz, J. Yu, G.H. Xie, and B.A.M. Bouman. Crop response of aerobic rice and winter wheat to nitrogen, phosphorus and potassium in a double cropping system. Nutrient Cycling in Agroecosystems, 86:301–315, 2010.

D.K. Das, D. Maiti, and H. Pathak. Site-specific nutrient management in rice in Eastern India using a modeling approach. Nutrient Cycling in Agroecosystems, 83: 85–94, 2009.

S.K. De Datta, R.J. Buresh, M.I. Samson, and Kai-Rong Wang. Nitrogen use efficiency and nitrogen-15 balances in broadcast-seeded flooded and transplanted rice. Soil Science Society of America Journal, 52:849–855, 1988.

G. Delogu, L. Cattivelli, N. Pecchioni, D. De Falcis, T. Maggiore, and A.M. Stanca. Uptake and agronomic efficiency of nitrogen in winter barley and winter wheat. European Journal of Agronomy, 9:11–20, 1998.

A. Dobermann, C. Witt, D. Dawe, S. Abdulrachman, H.C. Gines, R. Nagarajan, S. Sa- tawathananont, T.T. Son, P.S. Tan, G.H. Wang, N.V. Chien, V.T.K. Thoa, C.V. Phung, P. Stalin, P. Muthukrishnan, V. Ravi, M. Babu, S. Chatuporn, J. Sook- thongsa, Q. Sun, R. Fu, G.C. Simbahan, and M.A.A. Adviento. Site-specific nutrient management for intensive rice cropping systems in Asia. Field Crops Research, 74: 37–66, 2002.

A. Dobermann, C. Witt, S. Abdulrachman, H.C. Gines, R. Nagarajan, T.T. Son, P.S. Tan, G.H. Wang, N.V. Chien, V.T.K. Thoa, C.V. Phung, P. Stalin, P. Muthukrish- nan, V. Ravi, M. Babu, G.C. Simbahan, and M.A.A. Adviento. Soil fertility and indigenous nutrient supply in irrigated rice domains of Asia. Agronomy Journal, 95: 913–923, 2003.

A.D. Doyle and I.C.R. Holford. The uptake of nitrogen by wheat, its agronomic ef- ficiency and their relationship to soil and fertilizer nitrogen. Australian Journal Agricultural of Agricultural Research, 44:1245–1258, 1993.

A.D. Doyle and C.C. Leckie. Recovery of fertiliser nitrogen in wheat grain and its im-

120 plications for economic fertiliser use. Australian Journal of Experimental Agriculture, 32:383–387, 1992.

Y. Duan, M. Xu, X. He, S. Li, and X. Sun. Long-term pig manure application reduces the requirement of chemical phosphorus and potassium in two rice-wheat sites in subtropical China. Soil Use and Management, 27:427–436, 2011.

Q. Fang, Q. Yu, E. Wang, Y. Chen, G. Zhang, J. Wang, and L. Li. Soil nitrate accumu- lation, leaching and crop nitrogen use as influenced by fertilization and irrigation in an intensive wheat–maize double cropping system in the North China Plain. Plant and Soil, pages 335–350, 2006.

R.A. Fischer. Irrigated spring wheat and timing and amount of nitrogen fertilizer. II. physiology of grain yield response. Field Crops Research, 33:57–80, 1993.

R.A. Fischer, G.N. Howe, and Z. Ibrahim. Irrigated spring wheat and timing and amount of nitrogen fertilizer. I. grain yield and protein content. Field Crops Research, 33:37–56, 1993.

L.E. Gauer, C.A. Grant, D.T. Gehl, and L.D. Bailey. Effects of nitrogen fertilization on grain protein content, nitrogen uptake, and nitrogen use efficiency of six spring wheat (Triticum aestivum l.) cultivars, in relation to estimated moisture supply. Canadian Journal of Plant Science, 72:235–241, 1992.

B.B. Ghaley, H. Hgh-Jensen, and J.L. Christiansen. Recovery of nitrogen fertilizer by traditional and improved rice cultivars in the Bhutan Highlands. Plant and Soil, 332:233–246, 2010.

D. Giambalvo, P. Ruisi, G. Di Miceli, A. S. Frenda, and G. Amato. Nitrogen use efficiency and nitrogen fertilizer recovery of durum wheat genotypes as affected by interspecific competition. Agronomy Journal, 102:707–715, 2010.

G. Guarda, S. Padovan, and G. Delogu. Grain yield, nitrogen-use efficiency and baking quality of old and modern Italian bread-wheat cultivars grown at different nitrogen levels. European Journal of Agronomy, 21:181–192, 2004.

121 S.M. Haefele, M.C.S. Wopereis, M.K. Ndiaye, S.E. Barro, and M. Ould Isselmou. In- ternal nutrient efficiencies, fertilizer recovery rates and indigenous nutrient supply of irrigated lowland rice in Sahelian West Africa. Field Crops Research, 80:19–32, 2003.

S.M. Haefele, S.M.A. Jabbar, J.D.L.C. Siopongco, A. Tirol-Padre, S.T. Amarante, P.C. Sta Cruz, and W.C. Cosico. Nitrogen use efficiency in selected rice ( Oryza sativa L.) genotypes under different water regimes and nitrogen levels. Field Crops Research, 107:137–146, 2008.

T. Horie, M. Ohnishi, J.F. Angus, L.G. Lewin, T. Tsukaguchi, and T. Matano. Physi- ological characteristics of high-yielding rice inferred from cross-location experiments. Field Crops Research, 52:55–67, 1997.

M.F. Hossain, S.K. White, S.F. Elahi, N. Sultana, M.H.K. Choudhury, Q.K. Alam, J.A. Rother, and J.L. Gaunt. The efficiency of nitrogen fertiliser for rice in Bangladeshi farmers fields. Field Crops Research, 93:94–107, 2005.

J. Huang, F. He, K. Cui, R. J. Buresh, B. Xu, W. Gong, and S. Peng. Determination of optimal nitrogen rate for rice varieties using a chlorophyll meter. Field Crops Research, 105:70–80, 2008.

L. Jiang, D. Dong, X. Gan, and S. Wei. Photosynthetic efficiency and nitrogen dis- tribution under different nitrogen management and relationship with physiological N-use efficiency in three rice genotypes. Plant and Soil, 271:321–328, 2005.

Q. Jing, B.A.M. Bouman, H. Hengsdijk, H. Van Keulen, and W. Cao. Exploring options to combine high yields with high nitrogen use efficiencies in irrigated rice in China. European Journal of Agronomy, 26:166–177, 2007.

Q. Jing, B. Bouman, H. van Keulen, H. Hengsdijk, W. Cao, and T. Dai. Disentangling the effect of environmental factors on yield and nitrogen uptake of irrigated rice in Asia. Agricultural Systems, 98:177–188, 2008.

122 Q. Jing, H. Van Keulen, H. Hengsdijk, W. Cao, P.S. Bindraban, T. Dai, and D. Jiang. Quantifying N response and N use efficiency in rice–wheat (RW) cropping systems under different water management. The Journal of Agricultural Science, 147:303– 312, 2009.

N. Kalra, D. Chakraborty, P. Ramesh Kumar, M. Jolly, and P.K. Sharma. An approach to bridging yield gaps, combining response to water and other resource inputs for wheat in northern India, using research trials and farmers’ fields data. Agricultural Water Management, 93:54–64, 2007.

C.S. Khind and F.N. Ponnamperuma. Effects of water regime on growth, yield, and nitrogen uptake of rice. Plant and Soil, 59:287–298, 1981.

H.S. Khurana, S.B. Phillips, Bijay-Singh, M.M. Alley, A. Dobermann, A.S. Sidhu, Yadvinder-Singh, and S. Peng. Agronomic and economic evaluation of site-specific nutrient management for irrigated wheat in northwest India. Nutrient Cycling in Agroecosystems, 82:15–31, 2008.

D. Kumar, C. Devakumar, R. Kumar, A. Das, P. Panneerselvam, and Y.S. Shivay. Effect of neem-oil coated prilled urea with varying thickness of neem-oil coating and nitrogen rates on productivity and nitrogen-use efficiency of lowland irrigated rice under Indo-Gangetic Plains. Journal of Plant Nutrition, 33:1939–1959, 2010.

K. Kumar and K.M. Goh. Management practices of antecedent leguminous and non- leguminous crop residues in relation to winter wheat yields, nitrogen uptake, soil nitrogen mineralization and simple nitrogen balance. European Journal of Agronomy, 16:295–308, 2002.

J.K. Ladha, D. Dawe, T.S. Ventura, U. Singh, W. Ventura, and I. Watanabe. Long- term effects of urea and green manure on rice yields and nitrogen balance. Soil Science Society of America Journal, 64:1993–2001, 2000.

J. Le Gouis, D. Beghin, E. Heumez, and P. Pluchard. Genetic differences for nitrogen uptake and nitrogen utilisation efficiencies in winter wheat. European Journal of Agronomy, 12:163–173, 2000.

123 J. Liu, H. Liu, S. Huang, X. Yang, B. Wang, X. Li, and Y. Ma. Nitrogen efficiency in long-term wheat-maize cropping systems under diverse field sites in China. Field Crops Research, 118:145–151, 2010.

M. Liu, Z. Yu, Y. Liu, and N.T. Konijn. Fertilizer requirements for wheat and maize in China: The QUEFTS approach. Nutrient Cycling in Agroecosystems, 74:245–258, 2006.

X. Liu, P. He, J. Jin, W. Zhou, G. Sulewski, and S. Phillips. Yield gaps, indigenous nutrient supply, and nutrient use efficiency of wheat in China. Agronomy Journal, 103:1452–1463, 2011.

L. L´opez-Bellido, R. J. L´opez-Bellido, and F.J. L´opez-Bellido. Fertilizer nitrogen ef- ficiency in durum wheat under rainfed mediterranean conditions: Effect of split application. Agronomy journal, 98:55–62, 2006.

A. J. Macdonald and R.J. Gutteridge. Effects of take-all (Gaeumannomyces graminis var. tritici) on crop N uptake and residual mineral N in soil at harvest of winter wheat. Plant and Soil, 350:253–260, 2012.

D. Maiti, D.K. Das, and H. Pathak. Fertilizer requirement for irrigated wheat in easter India using the QUEFTS simulation model. TheScientificWorldJOURNAL, 6:231–245, 2006.

A.M. McGuire, D.C. Bryant, and R.F. Denison. Wheat yields, nitrogen uptake, and soil moisture following winter legume cover crop vs. fallow. Agronomy Journal, 90: 404–410, 1998.

K. Naklang, D. Harnpichitvitaya, S.T. Amarante, L.J. Wade, and S.M. Haefele. In- ternal efficiency, nutrient uptake, and the relation to field water resources in rainfed lowland rice of northeast Thailand. Plant and Soil, 286:193–208, 2006.

C. Noulas, I. Alexiou, J. M. Herrera, and P. Stamp. Course of dry matter and nitrogen accumulation of spring wheat genotypes known to vary in parameters of nitrogen use efficiency. Journal of Plant Nutrition, 36:1201–1218, 2013.

124 S.E. Ockerby, S.W. Adkins, and A.L. Garside. The uptake and use of nitrogen by paddy rice in fallow, cereal, and legume cropping systems. Australian Journal of Agricultural Research, 50:945–952, 1999.

D.C. Olk, K.G. Cassman, G. Simbahan, P.C. Sta. Cruz, S. Abdulrachman, R. Na- garajan, P.S. Tan, and S. Satawathananont. Interpreting fertilizer-use efficiency in relation to soil nutrient-supplying capacity, factor productivity, and agronomic effi- ciency. Nutrient Cycling in Agroecosystems, 53:35–41, 1999.

R. Ortiz-Monasterio, K.D. Sayre, S. Rajaram, and M. McMahon. Genetic progress in wheat yield and nitrogen use efficiency under four nitrogen rates. Crop Science, 37: 898–904, 1997.

H. Pathak, P.K. Aggarwal, R. Roetter, N. Kalra, S.K. Bandyopadhaya, S. Prasad, and H. Van Keulen. Modelling the quantitative evaluation of soil nutrient supply, nutrient use efficiency, and fertilizer requirements of wheat in India. Nutrient Cycling in Agroecosystems, 65:105–113, 2003.

S.K. Patil, U. Singh, V.P. Singh, V.N. Mishra, R.O. Das, and J. Henao. Nitrogen dynamics and crop growth on an Alfisol and a Vertisol under a direct-seeded rainfed lowland rice-based system. Field Crops Research, 70:185–199, 2001.

S. Peng, F.V. Garcia, R.C. Laza, A.L. Sanico, R.M. Visperas, and K.G. Cassman. Increased N-use efficiency using a chlorophyll meter on high-yielding irrigated rice. Field Crops Research, 47:243–252, 1996.

S. Peng, R. J. Buresh, J. Huang, J. Yang, Y. Zou, X. Zhong, G. Wang, and F. Zhang. Strategies for overcoming low agronomic nitrogen use efficiency in irrigated rice sys- tems in China. Field Crops Research, 96:37–47, 2006.

V. Pooniya and Y. S. Shivay. Enrichment of basmati rice grain and straw with zinc and nitrogen through ferti-fortification and summer green manuring under indo-gangetic plains of India. Journal of Plant Nutrition, 36:91–117, 2013.

125 V. Pooniya, Y. S. Shivay, A. Rana, L. Nain, and R. Prasanna. Enhancing soil nutrient dynamics and productivity of Basmati rice through residue incorporation and zinc fertilization. European Journal of Agronomy, 41:28–37, 2012.

J. Qiao, L. Yang, T. Yan, F. Xue, and D. Zhao. Nitrogen fertilizer reduction in rice production for two consecutive years in the Taihu Lake area. Agriculture, Ecosystems & Environment, 146:103–112, 2012.

J. Qin, S.M. Impa, Q. Tang, S. Yang, J. Yang, Y. Tao, and K.S.V. Jagadish. Integrated nutrient, water and other agronomic options to enhance rice grain yield and N use efficiency in double-season rice crop. Field Crops Research, 148:15–23, 2013.

S. Qiu, X. Ju, X. Lu, L. Li, J. Ingwersen, T. Streck, P. Christie, and F. Zhang. Improved nitrogen management for an intensive winter wheat/summer maize double-cropping system. Soil Science Society of America Journal, 76:286–297, 2012.

V.O. Sadras and C. Lawson. Nitrogen and water-use efficiency of australian wheat varieties released between 1958 and 2007. European Journal of Agronomy, 46:34–41, 2013.

R. Setia, K.N. Sharma, P. Marschner, and H. Singh. Changes in nitrogen, phosphorus, and potassium in a long-term continuous maize-wheat cropping system in India. Communications in Soil Science and Plant Analysis, 40:3348–3366, 2009.

A.R. Sharma and U.K. Behera. Nitrogen contribution through Sesbania green manure and dual-purpose legumes in maize–wheat cropping system: agronomic and economic considerations. Plant and Soil, 325:289–304, 2009.

Z. Shi, Q. Jing, J. Cai, D. Jiang, W. Cao, and T. Dai. The fates of 15 n fertilizer in relation to root distributions of winter wheat under different N splits. European Journal of Agronomy, 40:86–93, 2012a.

Z. Shi, D. Li, Q. Jing, J. Cai, D. Jiang, W. Cao, and T. Dai. Effects of nitrogen applications on soil nitrogen balance and nitrogen utilization of winter wheat in a rice-wheat rotation. Field Crops Research, 127:241–247, 2012b.

126 U. Singh, J.K. Ladha, E.G. Castillo, G. Punzalan, A. Tirol-Padre, and M. Duqueza. Genotypic variation in nitrogen use efficiency in medium-and long-duration rice. Field Crops Research, 58:35–53, 1998.

V.K. Singh and B.S. Dwivedi. Yield and nitrogen use efficiency in wheat, and soil fertility status as influenced by substitution of rice with pigeon pea in a rice–wheat cropping system. Australian Journal of Experimental Agriculture, 46:1185–1194, 2006.

E.M.A. Smaling and B.H. Janssen. Calibration of QUEFTS, a model predicting nu- trient uptake and yields from chemical soil fertility indices. Geoderma, 59:21–44, 1993.

P. Suriyakup, A. Polthanee, K. Pannangpetch, R. Katawatin, J.C. Mouret, and C. Clermont-Dauphin. Introducing mungbean as a preceding crop to enhance ni- trogen uptake and yield of rainfed rice in the north-east of Thailand. Australian Journal of Agricultural Research, 58:1059–1067, 2007.

S. Takahashi, M. R. Anwar, and S. G de Vera. Effects of compost and nitrogen fertilizer on wheat nitrogen use in Japanese soils. Agronomy Journal, 99:1151–1157, 2007.

C. T´etard-Jones,P. N. Shotton, L. Rempelos, J. Cooper, M. Eyre, C. H. Orr, C. Leifert, and A.M.R. Gatehouse. Quantitative proteomics to study the response of wheat to contrasting fertilisation regimes. Molecular Breeding, 31:379–393, 2013.

J. Timsina, U. Singh, M. Badaruddin, C. Meisner, and M.R. Amin. Cultivar, nitrogen, and water effects on productivity, and nitrogen-use efficiency and balance for rice– wheat sequences of Bangladesh. Field Crops Research, 72:143–161, 2001.

G. Wang, A. Dobermann, C. Witt, Q. Sun, and R. Fu. Performance of site-specific nutrient management for irrigated rice in southeast China. Agronomy Journal, 93: 869–878, 2001.

G. Wang, Q.C. Zhang, C. Witt, and R.J. Buresh. Opportunities for yield increases and

127 environmental benefits through site-specific nutrient management in rice systems of Zhejiang province, China. Agricultural Systems, 94:801–806, 2007.

Y. Wang, E. Wang, D. Wang, S. Huang, Y. Ma, C. J. Smith, and L. Wang. Crop productivity and nutrient use efficiency as affected by long-term fertilisation in North China Plain. Nutrient Cycling in Agroecosystems, 86:105–119, 2010.

D. Wei, K. Cui, J. Pan, G. Ye, J. Xiang, L. Nie, and J. Huang. Genetic dissection of grain nitrogen use efficiency and grain yield and their relationship in rice. Field Crops Research, 124:340–346, 2011.

C. Witt, A Dobermann, S. Abdulrachman, H.C. Gines, W. Guanghuo, R. Nagarajan, S. Satawatananont, T. Thuc Son, P. Sy Tan, L. Van Tiem, and D. C. Olk. Internal nutrient efficiencies of irrigated lowland rice in tropical and subtropical Asia. Field Crops Research, 63:113–138, 1999.

C. Witt, K.G. Cassman, D.C. Olk, U. Biker, S.P. Liboon, M.I. Samson, and J.C.G. Ottow. Crop rotation and residue management effects on carbon sequestration, nitrogen cycling and productivity of irrigated rice systems. Plant and Soil, 225: 263–278, 2000.

Y. Xu, L. Nie, R.J. Buresh, J. Huang, K. Cui, B. Xu, W. Gong, and S. Peng. Agro- nomic performance of late-season rice under different tillage, straw, and nitrogen management. Field Crops Research, 115:79–84, 2010.

C.Y. Xue, X.G. Yang, B.A.M. Bouman, W. Deng, Q.P. Zhang, J. Yang, W.X. Yan, T.Y. Zhang, A.J. Rouzi, H.Q. Wang, and P. Wang. Effects of irrigation and nitrogen on the performance of aerobic rice in northern China. Journal of Integrative Plant Biology, 50:1589–1600, 2008.

Y. Yang, M. Zhang, L. Zheng, D. Cheng, M. Liu, Y. Geng, and J. Chen. Controlled- release urea for rice production and its environmental implications. Journal of Plant Nutrition, 36:781–794, 2013.

128 Yavinder-Shing, J.K. Ladha, C.S. Khind, R.K. Gupta, O.P. Meelu, and E. Pasuquin. Long-term effects of organic inputs on yield and soil fertility in the rice–wheat rota- tion. Soil Science Society of America Journal, 68:845–853, 2004.

Y. Ye, X. Liang, Y. Chen, J. Liu, J. Gu, R. Guo, and L. Li. Alternate wetting and drying irrigation and controlled-release nitrogen fertilizer in late-season rice. Effects on dry matter accumulation, yield, water and nitrogen use. Field Crops Research, 144:212–224, 2013.

J. Ying, S. Peng, G. Yang, N. Zhou, R.M. Visperas, and K.G. Cassman. Comparison of high-yield rice in tropical and subtropical environments II. Nitrogen accumulation and utilization efficiency. Field Crops Research, 57:85–93, 1998.

L. Zhang, J.H.J. Spiertz, S. Zhang, B. Li, and W. Van der Werf. Nitrogen economy in relay intercropping systems of wheat and cotton. Plant and Soil, 303:55–68, 2008.

129 130 Appendix B

Journal information of the studies in the review

Table B.1: Journals and their impact factor for studies in the review

Name Category 5 years im- Number of pact factor manuscripts selected

Agricultural Systems Agriculture Multidisciplinary 2.837 2 Agricultural Water Agronomy 2.552 3 Management

Agriculture Ecosystems Agriculture Multidisciplinary 3.673 1 and Environment

Agronomy Journal Agronomy 1.989 7

Animal Production Science (before Australian Journal of Agriculture Multidisciplinary 1.228 3 Experimental Agriculture)

Canadian Journal of Agronomy 0.764 2 Plant Science

Communications in Soil Science and Plant Agronomy 0.612 2 Analysis

131 Table B.1 – continued from previous page Crop and Pasture Science (before Agriculture Multidisciplinary 1.439 4 Australian Journal of Agricultural Research)

Crop Science Agronomy 2.096 1

European Journal of Agronomy 3.311 8 Agronomy

Field Crops Research Agronomy 2.984 28

Geoderma Soil Science 2.904 1

Journal of Agricultural Agriculture Multidisciplinary 2.604 1 Science

Journal of Integrative Biochemistry and Molecular 2.429 1 Plant Biology

Journal of Plant Plant Science 0.851 4 Nutrition

Molecular Breeding Agronomy 3.304 1

Nutrient Cycling in Soil Science 1.966 8 Agroecosystems

Paddy Water and Agronomy 0.889 1 Environment

Plant and Soil Agronomy 3.108 13

Soil Science Society of Soil Science 2.232 6 America Journal

Soil Use and Soil Science 2.219 1 Management

Scientific World Multidisciplinary Science 1.603 1 Journal

132 Appendix C

Equivalence between Pham-Gia et al. (2006) and (Marsaglia, 1965, 2006) expressions of the pdf of the ratio

a + U In this appendix we demonstrate that the expression of the pdf of T = given in b + V Pham-Gia et al. (2006), where U and V are two independent standard normal variables and a and b are defined as in Eq 2.5, is equivalent to the one in (Marsaglia, 1965, 2006). Firstly, let us calculate the integral:

∞ −t2−2tz H−2(z) = te dt Z0 Notice that:

∞ ∞ −t2−2tz −t2−2tz 1 = ( 2t 2z)e dt = 2H− (z) 2z e dt − − − − 2 − Z0 Z0 If we define:

∞ ∞ ∞ 2 2 2 2 2 2 A = e−t −2tzdt = e−t −2tz+z −z dt = ez e−(t+z) dt Z0 Z0 Z0 By performing the following change of variable, we obtain:

r dr t + z = dt = √2 ⇒ √2

z2 ∞ z2 ∞ e 2 e 1 2 2 −r /2 √ −r /2 z √ A = √ e dr = 2π √ e dr = e √π 1 ϕ( 2z) (C.1) √2 2z √2 2z √2π − Z Z   133 1 z2 Therefore, H− (z) = ze √π 1 ϕ(√2z . If we substitute the latter expression 2 2 − − in Eq. 2.8 with σx = σy = 1 and µx = b µy = a, we arrive at:

e−(a2+b2)/2 f (r) = [H− (s(r)) + H− ( s(r))] r π(1 + r2) 2 2 − where: ar + b q s(r) = = , with q defined as in Eq. 2.6 2(r2 + 1) √2 Then, p

−(a2+b2)/2 e q q2/2 ft(t) = [1 + e √π( 1 + ϕ(q) + 1 ϕ( q))] π(1 + r2) √2 − − − −(a2+b2)/2 q −r2/2 −(a2+b2)/2 q e q 2 e e q 2 2 eq /2√π dr eq /2 e−r /2dr = 2 1 + = 2 1 + 2 π(1 + r ) √2 − √2π π(1 + r ) 2 " Z q #  Z0  −(a2+b2)/2 q −(a2+b2)/2 −(a2+b2)/2 q e 2 2 e e 2 2 = 1 + qeq /2 e−r /2dr = + q e−(r −q )/2dr π(1 + r2) π(1 + r2) π(1 + r2)  Z0  Z0 q −(r2−q2)/2 2 2 1 2 2 q e dr = (e−(a +b )/2) + (1 e−(a +b )/2) 0 π(1 + r2) − π(1 + r2)(e(a2+b2)/2 1) R −

−(a2+b2)/2 2 2 1 e The last equality is obtained noticing that e−(a +b )/2 = − (e(a2+b2)/2 1) −

134 Appendix D

Application of the EM algorithm for estimating the parameters of a mixture of multivariate Gaussian distributions

In this appendix we detail the application of the EM algorithm to estimate the parameter vector of a mixture of multivariate Gaussian distributions. Following the approach of

Dempster et al. (1977), the first step is to compute Lc(z, y, ψ). The log likelihood function of the complete random sample is given by (McLachlan & Peel, 2007 cited in Figueiredo & Jain , 2002):

n g L (z, y, ψ) = z ln π φ(y µ , Σ ) c ij i j| i i j=1 i=1 Xn Xg  = z ln(π ) + ln(φ(y µ , Σ ) ij i j| i i j=1 i=1 X X  where φ is the pdf of a MVN. The two steps of the E-M algorithm are:

1. Compute E(L (z, y, ψ) y, ψ(r)), where ψ(r) = (π(r), . . . π(r) , µ(r),..., µ(r), Σ(r) ..., Σ(r)), c | 1 g−1 1 g 1 g for a fixed g.

n g E(L (z, y, ψ) y, ψ(r)) = E(z y , ψ(r))[ln π + ln φ(y µ , Σ )] c | ij| j i j| i i j=1 i=1 X X E(z y , ψ(r)) = 1P (z = 1 y , ψ(r)) + 0P (z = 0 y, ψ(r)) ij| j ij | j ij | = P (z = 1 y , ψ(r)) (D.1) ij | j

135 by Eq. 3.2 and D.1, it follows that:

(r) (r) (r) (r) (r) (r+1) πi φ(yj µi , Σi ) E(zij yj, ψ ) = P (zij = 1 yj, ψ ) = τij = | | | g π(r)φ(y µ(r), Σ(r)) i=1 i j| i i P 2. Find ψ(r+1) that maximises E(L (z, y, ψ) y, ψ(r)) c | n g E(L (z, y, ψ) y, ψ(r)) = τ (r+1) ln(π ) + τ (r+1) ln φ(y µ , Σ ) (D.2) c | ij i ij j| i i j=1 i=1 X X (r+1) Firstly, let us findπ ˆi , i = 1 . . . g. The second addend of the Eq. D.2 does not depend on π so it is enough to derive:

n g (r+1) h(π) = τij ln(πi) and j=1 i=1 X X n (r+1) n (r) ∂h(π) τ τ = j=1 ij j=1 gj ∂π π − (1 π ... π − ) i P i −P1 − g 1 n (r+1) Denoting ni = τij , we get j=1 X ∂h(π) n n = i g ∂π π − (1 π ... π − ) i i − 1 − g 1 n (1 π ... π − ) n π = i − 1 − g 1 − g i = 0 (D.3) π (1 π . . . π − ) i − 1 − g 1 ∂h(π) ∂h(π) Note that s, s = i, by subtracting , we get that ∀ 6 ∂πi − ∂πs nsπi πs = s = 1 . . . g s = i ni ∀ 6

If we substitute πs, s = 1 . . . g, into the expression D.3 we obtain:

n1πi n2πi ng−1πi ni(1 ... ) ngπi = 0 − ni − ni ni − n π (n + n + ... + n ) = 0 (D.4) i − i 1 2 g Notice that: n (r+1) (r+1) (r+1) n1 + n2 + ... + ng = τ1j + τ2j + ... + τgj = n (D.5) j=1 X Therefore, by Eq.D.4 and D.5 , it is concluded that:

(r+1) n n τ πˆ (r+1) = i = j=1 ij i n n P

136 Taking derivatives with respect to µ and Σ , i = 1 . . . g, in Eq. D.2: i i ∀ g n ∂ n ∂ τ (r+1) ln φ(y µ , Σ ) = τ (r+1) ln φ(y µ , Σ ) = 0 kj ∂µ j| k k ij ∂µ j| i i j=1 i j=1 i Xk=1 X X g n ∂ n ∂ τ (r+1) ln φ(y µ , Σ ) = τ (r+1) ln φ(y µ , Σ ) = 0 (D.6) kj ∂Σ j| k k ij ∂Σ j| i i j=1 i j=1 i Xk=1 X X ∂ ∂ Note that ln φ(yj µk, Σk) = 0 and ln φ(yj µk, Σk) = 0 if i = k. Using some ∂µi | ∂Σi | 6 (r+1) properties detailed in (Rencher, 1998, p.416), it is straightforward to show that µi (r+1) and Σi satisfying D.6 exist in a closed form given by:

n n (r+1) (r+1) (r+1) µi = τij yj/ τij j=1 j=1 Xn X n Σ(r+1) = τ (r+1)(y µ(r+1))>(y µ(r+1))/ τ (r+1) i ij j − i j − i ij j=1 j=1 X X

(r+1) We firstly calculate µi : n ∂ τ (r+1) ln φ(y µ , Σ ) = ij ∂µ> j| i i j=1 i X n ∂ 1 1 τ (r+1) [ p ln(√2π) ln Σ (y µ )Σ−1(y µ )>] = ij ∂µ> − − 2 | i| − 2 j − i i j − i j=1 i X n ∂ 1 τ (r+1) [ (y µ )Σ−1(y µ )>] = ij ∂µ> −2 j − i i j − i j=1 i X n ∂ 1 τ (r+1) [ (y Σ−1y> y Σ−1µ> µ Σ−1y> + µ Σ−1µ>)] = ij ∂µ> −2 j i j − j i i − i i j i i i j=1 i X n ∂ 1 τ (r+1) [ ( y Σ−1µ> (µ>)>Σ−1y> + (µ>)>Σ−1µ>)] = ij ∂µ> −2 − j i i − i i j i i i j=1 i X ∂(a>x) ∂(x>a) Using A.13.2 and A. 13.3 in Rencher (1998) which state that = = a ∂x ∂x ∂(x>Ax) and A.13.3 = 2Ax, ∂x n 1 τ (r+1)( )[ (y Σ−1)> Σ−1y> + 2Σ−1µ>] = ij −2 − j i − i j i i j=1 X 137 n τ (r+1)[Σ−1( y> + µ>)] ij i − j i ⇒ j=1 X n n (r+1) (r+1) µi = τij yj/ τij j=1 j=1 X X (r+1) Now we calculate Σi . Instead of taking derivatives with respect to Σi, we do so −1 with respect to Σi . n ∂ τ (r+1) ln φ(y µ(r), Σ ) = ij ∂ −1 j| i i j=1 Σi X n ∂ 1 1 τ (r+1) p ln √2π ln Σ (y µ(r))Σ−1(y µ(r))> ij ∂ −1 {− − 2 | i| − 2 j − i i j − i } j=1 Σi X n ∂ 1 1 = τ (r+1) ln Σ−1 (y µ(r))Σ−1(y µ(r))> ij ∂ −1 {2 | i | − 2 j − i i j − i } j=1 Σi X But

(y µ(r))Σ−1(y µ(r))> = tr (y µ(r))Σ−1(y µ(r))> j − i i j − i { j − i i j − i } and tr(CD) = tr(DC)

n ∂ 1 1 = τ (r+1) ln Σ−1 tr(Σ−1(y µ(r))>(y µ(r))) ij ∂ −1 {2 | i | − 2 i j − i j − i } j=1 Σi X Let us take D = (y µ(r))>(y µ(r)) j − i j − i ∂tr(CD) Applying = D + D> diag(D) (A.13.5 Rencher, 1998) ∂C − ∂ ln C and | | = 2C−1 diag(C−1) (A.13.6 Rencher, 1998) we get: ∂C − n 1 τ (r+1) (2Σ diag(Σ ) D D> + diag(D)) ij {2 i − i − − } j=1 X D is symmetric so n 1 1 τ (r+1) Σ diag(Σ ) D + diag(D) ij { i − 2 i − 2 } ⇒ j=1 X If we equal the previous expression to 0, we get n 1 n 1 τ (r+1) Σ diag(Σ ) = τ (r+1) D diag(D) ij { i − 2 i } ij { − 2 } ⇒ j=1 j=1 X X

138 n τ (r+1)D n τ (r+1)(y µ(r))>(y µ(r)) (r) j=1 ij j=1 ij j − i j − i Σi = n (r+1) = n (r+1) P j=1 τij P j=1 τij P P (r+1) (r+1) (r+1) (r+1) (r+1) (r+1) To show the second order condition for (π1 , . . . , πg , µ1 ,..., µg , Σ1 ,..., Σg ) to be a maximum, we firstly demonstrate the following property:

Property 1: n If g(y θ ... θ ) = g (y θ ) and θˆ is a local maximum of g (y θ ) i (θˆ ,..., θˆ ) | 1 n i | i i i | i ∀ ⇒ 1 n i=1 is a local maximumX of g(y θ ,... θ ) | 1 n

Proof. The sufficient conditions to show that (θˆ ,..., θˆ ) is a maximum of g(y θ ... θ ) 1 n | 1 n are (see Snyman, 2005):

1. Dg(θˆ1,..., θˆn) = 0, where Dg is the gradient of g

2. Hg(θˆ1,..., θˆn) is definite negative, where Hg is the Hessian matrix of g. ∂g ∂g The first condition is straightfoward to show by taking into account that = i = 0 ∂θi ∂θi i. The second condition follows by considering that H is a diagonal block matrix ∀ g where all the blocks are negative definite matrices.

Hg1 0 ... 0   0 Hg2 ... 0   Hg =  0 0 ... 0     . . . .   . . . .       0 0 ...Hgn      Thus, H λI = H λI ... H − λI and this implies that all the eigenvalues | g − | | g1 − | | g n − | of Hg are negative. 

ˆ ˆ Now we want to prove that (ˆπ1,... πˆg, µˆ 1,..., µˆ g, Σ1,..., Σg) is the global maxi- mum of E(L (z, y, ψ) y, ψ(r)). c | We can express

g E(L (z, y, ψ) y, ψ(r)) = h(π) + g (y µ , Σ ) c | i | i i i=1 X 139 with:

n g (r+1) h(π) = τij ln(πi) (D.7) j=1 i=1 Xn X g (y µ , Σ ) = τ (r+1) ln φ(y µ Σ ) (D.8) i | i i ij j| i i j=1 X It is straightforward to show that π(r+1) is the global maximum of h(π) by calculating (r+1) (r+1) the second derivative and applying Property 1. To show that (µi , Σi ) is the global maximum of g (y µ , Σ ) we follow the argument in Anderson & Olkin (1985) i | i i for the MLE of the multivariate normal distribution. The function g (y µ , Σ ) is i | i i continuously differentiable and g (y µ , Σ ) when the parameters approach the i | i i → −∞ boundary of the parameter space. Then, its only critical point is the global maximum. (r+1) (r+1) (r+1) (r+1) (r+1) (r+1) By Property 1, (π1 , . . . , πg , µ1 ,..., µg , Σ1 ,..., Σg ) is the global maximum of E(L (z, y, ψ) y, ψ(r)). c |

140 Appendix E

R code for fitting mixtures models of univariate Gaus- sian distributions

Univariate mixture models can be fitted using the R packages mclust (Fraley & Raftery, 1999) or mixtools (Benaglia et al., 2009). For our analysis, we used mixtools because it is more convenient for implementing the starting strategies described in Section 2.3 of the manuscript. The starting strategy number iii was not used because it tends to produce negative values of the initial estimates of the means of the ratio, which is not biologically possible. The procedure for fitting mixture models has been detailed in the Section 2.3 of the manuscript. In this complementary material, we display the code employed to perform the univariate mixture analyses. For the purpose of illustrating the analysis, we consider the number of groups (g) fixed and equal to 3.

Firstly, call the library mixtools and mclust.

library(mclust) library(mixtools)

Then, read the csv file containing the data set. dataset <- read.table(file.choose(), header = T, sep = ",") IEN <- dataset$IEN

141 g = 3 # g is the number of groups size = length(dataset$IEN) # size is the sample size

A function called initial was created to calculate the initial estimates of the mixture when a cluster partition is provided. The inputs needed for this function are: the data (dat); the sample size (size); a vector containing the classification of each of the observations into groups (clust); and the number of groups (g). initial <- function(dat, size, clust, g) { dat <- as.matrix(dat) class <- unmap(clust) # unmap converts the vector clust into a matrix # with number of rows and columns given by the # sample size and the number of groups, # respectively. The entry class[ik] has two # possible values: 1, if the i-th observation is # classified in the k-th group, or 0 otherwise. sum <-t(dat) %*% class # Multiplying the data by class we get the sums # of the observations values classified in each # of the groups k <- rep(0, g) mu <- rep(0, g) lambda <- rep(0, g) aux <- matrix(nrow = g, ncol = size) for (i in 1:g) { k[i] <- sum(class[, i]) # k denotes the number of observations classified # in each of the groups

142 mu[i] <- sum[, i]/k[i] lambda[i] <- k[i]/size aux[i, ] <- rep(mu[i], size)

}

diff <- matrix(nrow = g, ncol = size) sumvariance <- rep(0, g) variance <- rep(0, g) for (i in 1:g) { diff[i, ] <- (aux[i, ] - dat)^2 sumvariance[i] <-t(diff[i, ]) %*% class[, i] variance[i] <- sumvariance[i]/k[i]

} # lambda, mu and variance are vectors containing # the initial estimates of the mixing # proportions, means and variances, respectively. # These values are stored in a list called init, # which is returned by the function initial. init <- list() init$lambda <- lambda init$mean <- mu init$variance <- variance return(init)

}

Now, proceed to fit univariate mixture models initiating the EM algorithm with the strategies detailed in Section 2.3 of the manuscript.

i) Random starts

143 clust1 <- sample(1:g, size, replace = "TRUE") init1 <- initial(dat = IEN, size = size, clust = clust1, g = g) model1 <- normalmixEM(IEN, lambda = init1$lambda, mu = init1$mean, sigma = init1$variance, epsilon = 1e-06, k = g, maxit = 10000)

## number of iterations= 789 plot(model1, whichplot = 2)

Density Curves 0.035 0.030 0.025 0.020 Density 0.015 0.010 0.005 0.000

20 40 60 80

Data

Figure E.1: Histogram of the internal nitrogen use efficiency data and the mixture components found by the EM algorithm initiated from random starts.

ii) K-means clust2 <- kmeans(IEN, g, nstart = 5)$cluster init2 <- initial(dat = IEN, size = size, clust = clust2, g = g) model2 <- normalmixEM(IEN, lambda = init2$lambda, mu = init2$mean, sigma = init2$variance, epsilon = 1e-06, k = g, maxit = 10000)

144 ## number of iterations= 1058 plot(model2, whichplot = 2)

Density Curves 0.035 0.030 0.025 0.020 Density 0.015 0.010 0.005 0.000

20 40 60 80

Data

Figure E.2: Histogram of the internal nitrogen use efficiency data and the mixture components found by the EM algorithm initiated from the partition provided by the K-means algorithm.

145 iv) Subsample solution sequence <- seq(1, size, 1) index <- sample(sequence, size = 200, replace = FALSE) subsample <- IEN[index] clustsub <- sample(1:g, 200, replace = "TRUE") initsub <- initial(dat = subsample, size = 200, clust = clustsub, g = g) modelsub <- normalmixEM(IEN, lambda = initsub$lambda, mu = initsub$mean, sigma = initsub$variance, k = g, maxit = 10, epsilon = 0.01)

## number of iterations= 2 model3 <- normalmixEM(IEN, lambda = modelsub$lambda, mu = modelsub$mean, sigma = modelsub$variance, epsilon = 1e-06, k = g, maxit = 10000)

## number of iterations= 812 plot(model3, whichplot = 2)

146 Density Curves 0.03 0.02 Density 0.01 0.00

20 40 60 80

Data

Figure E.3: Histogram of the internal nitrogen use efficiency data and the mixture components found by the EM algorithm initiated on a random subsample.

v) Short runs clustshort1 <- sample(1:g, size, replace = "TRUE") initshort1 <- initial(dat = IEN, size = size, clust = clustshort1, g = g) modelshort1 <- normalmixEM(IEN, lambda = initshort1$lambda, mu = initshort1$mean, sigma = initshort1$variance, epsilon = 0.01, k = g, maxit = 10000)

## number of iterations= 2

clustshort2 <- sample(1:g, size, replace = "TRUE") initshort2 <- initial(dat = IEN, size = size, clust = clustshort2, g = g) modelshort2 <- normalmixEM(IEN, lambda = initshort2$lambda, mu = initshort2$mean, sigma = initshort2$variance, epsilon = 0.01, k = g, maxit = 10000)

147 ## number of iterations= 2

clustshort3 <- sample(1:g, size, replace = "TRUE") initshort3 <- initial(dat = IEN, size = size, clust = clustshort3, g = g) modelshort3 <- normalmixEM(IEN, lambda = initshort3$lambda, mu = initshort3$mean, sigma = initshort3$variance, epsilon = 0.01, k = g, maxit = 10000)

## number of iterations= 2

modelshort1$loglik

## [1] -1646 modelshort2$loglik

## [1] -1646 modelshort3$loglik

## [1] -1646 max(modelshort1$loglik, modelshort3$loglik, modelshort3$loglik)

## [1] -1646

If modelshort3 provides the solution with the highest value of the log likelihood function among the short runs of the algorithm, use this solution to initiate a complete run of the algorithm.

148 Density Curves 0.03 0.02 Density 0.01 0.00

20 40 60 80

Data

Figure E.4: Histogram of the internal nitrogen use efficiency data and the mixture components found after running several short runs of the EM algorithm. modellong <- normalmixEM(IEN, lambda = modelshort3$lambda, mu = modelshort3$mean, sigma = modelshort3$variance, epsilon = 1e-06, k = g, maxit = 10000)

## number of iterations= 487 plot(modellong, whichplot = 2)

model1$loglik

## [1] -1628 model2$loglik

## [1] -1628 model3$loglik

## [1] -1629

149 modellong$loglik

## [1] -1629

Delete spurious solutions. From the remaining solutions, select the one with the highest value of the log likelihood function. If there are not spuriosities, proceed as follows: max(model1$loglik, model2$loglik, model3$loglik, modellong$loglik)

## [1] -1628

If obj1 is the solution with the highest value of the log likelihood function, record its AIC and BIC values. aic <- function(loglik, g, size) { aic = -2 * model1$loglik + 2 * (3 * g - 1) return(aic)

} bic <- function(loglik, g, size) { bic = -2 * model1$loglik + (3 * g - 1) * log(size) return(bic)

} aic(loglik = model1$loglik, g, size)

## [1] 3272 bic(loglik = model1$loglik, g, size)

## [1] 3305

This procedure is repeated starting with the maximum number of components until fitting just one component. However, mixtools does not allow to fit one component, so mclust is needed when g=1. The commands used for fitting one component with mclust are:

150 model1 = Mclust(dataset$IEN, G = 1, modelNames = "V") model1$parameters

## $pro ## [1] 1 ## ## $mean ## [1] 44.27 ## ## $variance ## $variance$modelName ## [1] "X" ## ## $variance$d ## [1] 1 ## ## $variance$G ## [1] 1 ## ## $variance$sigmasq ## [1] 119.4

Finally, the model selected is the one which minimises the information criteria.

151 152 Appendix F

R code for fitting mixture models of bivariate Gaus- sian distributions

Bivariate mixture models were fitted using the R package EMMIX (McLachlan et al., 1999). This package can be downloaded from the Web site: http://www.maths.uq.edu.au/ gjm/mix soft/EMMIX R/index.html. The procedure ∼ for fitting mixture models has been detailed in Section 2.3 of the manuscript. Here, we display the code employed to fit mixture models of bivariate Gaussian distributions. For the purpose of illustrating the analysis, we consider the number of components (g) fixed and equal to 3.

Firstly, set the R studio working directory to the root folder in which the package EMMIX has been downloaded. Then, call the package EMMIX and mnormt. The latter is needed for the starting strategy number 3 (see Section 2.3 of the manuscript).

source("EMMIX.R") library(mnormt)

Read the csv file containing the data set. dataset <- read.table(file.choose(), header = T, sep = ",")

For the bivariate analysis, the EM algorithm operates with the measurements of grain yield

153 (dataset$YGR14) and nitrogen uptake (dataset$NUPT). These measurements are stored in the matrix named dat. size = 432 # size is the sample size dat <- matrix(nrow = size, ncol = 2) dat[, 1] <- dataset$NUPT dat[, 2] <- dataset$YGR14 distr = "mvn" # distr refers to the type of component densities # of the mixture. In our case, multivariate # Gaussian distributions (mvn) ncov = 3 # ncov=3 indicates that each group is allowed to # have a different covariance matrix. g = 3 # g is the number of groups

A function called classcolor was created to visualise the solutions provided by the EM algorithm. The inputs needed for this function are: the data set (y); a logic parameter (ellipse) which can be TRUE or FALSE depending whether the user wants to draw the prediction ellipses; the mixture model fitted by EMMIX (obj); the confidence level for drawing the prediction ellipses (a); the size of the sample (n); and the number of groups (g). The output of this function is a plot which displays the observations in different colours depending on their classification. Furthermore, if ellipse=TRUE, the prediction ellipses for a new point of the population to belong to each of the groups are drawn. The ellipses are drawn modifying the function ellipse implemented in the R package ellipse (Murdoch et al., 2007) according to Eq. 4.4 in Chew (1966).

154 classcolor<-function(y, ellipse, obj, a, n, g) { p=2 # p is the dimension of the random vector aux=matrix(nrow=n, ncol=3) # aux is a matrix whose first two columns contain # the data of nitrogen uptake and grain yield, and the third # column contains the number of the group in which the # observations have been classified. tau<-obj$tau for (i in 1: n) { aux[i,1]<-y[i,1] aux[i,2]<-y[i,2] aux[i,3]<-which(tau[i,]==max(tau[i,]))

} # Next, the measurements of grain yield and nitrogen uptake # are coloured according to the classification given by the # EM algorithm xmin=0 xmax=max(y[,1])+20 ymin=0 ymax=max(y[,2])+20 plot(aux[,1][aux[,3]==1], aux[,2][aux[,3]==1], col="red", xlim=c(xmin,xmax), ylim=c(ymin, ymax), ylab="Grain Yield (kg/ha)", xlab="Nitrogen Uptake (kg/ha)",cex.lab=1.2, cex.axis=1.2 ) for(i in 2:g) { points(aux[,1][aux[,3]==i], aux[,2][aux[,3]==i], col=i+1)

}

155 # If ellipse=TRUE, the function ellipse2 draws the prediction # ellipses ellipse2<-function (mu, sigma, alpha , npoints , newplot, r1, draw, ...) { es <- eigen(sigma) e1 <- es$vec %*% diag(sqrt(es$val)) theta <- seq(0, 2 * pi, len = npoints) v1 <- cbind(r1 * cos(theta), r1 * sin(theta)) pts =t(mu - (e1 %*%t(v1))) if (newplot && draw) { plot(pts, ...)

} else if (!newplot && draw) { lines(pts, ...)

} invisible(pts) # end function }

if (ellipse=="TRUE") {

for (i in 1: g) { n1=length(aux[aux[,3]==i])/3 q1=qf(1-a, 2, n1-2) def<-((n1-1)*(n1+1)*p*q1)/((n1-p)*n1)

ellipse2(mu=obj$mu[,i], sigma=obj$sigma[,,i], alpha=a, r1=sqrt(def), npoints=1000, newplot=FALSE, draw=TRUE,type="l", lwd=2, col=i+1) points( obj$mu[,i][1],obj$mu[,i][2] , col="black", pch=16)

156 # end for } # end if } # end function }

Now, fit bivariate mixture models initiating the EM algorithm with the starting strategies described in the Section 2.3 of the manuscript.

i) Random starts initobj1 <- init.mix(dat, g, distr, ncov, nkmeans = 0, nrandom = 100, nhclust = FALSE) obj1 <- EMMIX(dat, g, distr, ncov, init = initobj1, itmax = 1000, epsilon = 1e-06)

## ## ------## ## 3 - Component Multivariate Normal Mixture Model ## ## ------## ## $distr ## [1] "mvn" ## ## $error ## [1] 0 ## ## $loglik ## [1] -5160 ##

157 ## $bic ## [1] 10424 ## ## $aic ## [1] 10355 ## ## $pro ## [1] 0.3037 0.3775 0.3188 ## ## $mu ## [,1] [,2] [,3] ## [1,] 51.42 33.88 72.43 ## [2,] 1993.46 1752.30 2808.45 ## ## $sigma ## , , 1 ## ## [,1] [,2] ## [1,] 301.8 11121 ## [2,] 11121.0 473365 ## ## , , 2 ## ## [,1] [,2] ## [1,] 187.4 9557 ## [2,] 9557.0 562741 ## ## , , 3 ## ## [,1] [,2]

158 4000 3000 2000 Grain Yield (kg/ha) Yield Grain 1000 0

0 20 40 60 80 100 120 140

Nitrogen Uptake (kg/ha)

Figure F.1: Cluster partition found by the EM algorithm initiated from random starts. [Observations in different colours have been classified as belonging to different groups. The dots represent the joint means and the ellipses are the 90% prediction regions for each group].

## [1,] 368.2 6279 ## [2,] 6279.4 359595 ## ## ## $ICL ## [1] -5349 ## ## ## ------obj1$loglik

## [1] -5160 classcolor(dat, ellipse = "TRUE", obj1, a = 0.1, n = size, g = g)

ii) K-means

159 initobj2 <- init.mix(dat, g, distr, ncov, nkmeans = 100, nrandom = 0, nhclust = FALSE) obj2 <- EMMIX(dat, g, distr, ncov, init = initobj2, itmax = 1000, epsilon = 1e-06)

## ## ------## ## 3 - Component Multivariate Normal Mixture Model ## ## ------## ## $distr ## [1] "mvn" ## ## $error ## [1] 0 ## ## $loglik ## [1] -5150 ## ## $bic ## [1] 10402 ## ## $aic ## [1] 10333 ## ## $pro ## [1] 0.4096 0.4390 0.1513 ## ## $mu

160 ## [,1] [,2] [,3] ## [1,] 42.43 70.66 20.42 ## [2,] 1866.97 2814.10 1070.22 ## ## $sigma ## , , 1 ## ## [,1] [,2] ## [1,] 118.2 3918 ## [2,] 3917.7 253491 ## ## , , 2 ## ## [,1] [,2] ## [1,] 322.3 5272 ## [2,] 5272.2 308424 ## ## , , 3 ## ## [,1] [,2] ## [1,] 37.65 2607 ## [2,] 2607.11 225208 ## ## ## $ICL ## [1] -5284 ## ## ## ------obj2$loglik

161 4000 3000 2000 Grain Yield (kg/ha) Yield Grain 1000 0

0 20 40 60 80 100 120 140

Nitrogen Uptake (kg/ha)

Figure F.2: Cluster partition found by the EM algorithm initiated from the partition obtained by the K-means algorithm. [Observations in different colours have been classified as belonging to different groups. The dots represent the joint means and the ellipses are the 90% prediction regions for each group].

## [1] -5150 classcolor(dat, ellipse = "TRUE", obj2, a = 0.1, n = size, g = g)

iii) Simulated means mu <-c() mu[1] <- mean(dat[, 1]) mu[2] <- mean(dat[, 2]) S <- cov(dat) meaninit <- rmnorm(n = g, mean = mu, S) # the initial values of the means sigmainit <- array(S,c(2, 2, g)) # the initial values of the covariance matrices initobj1$pro <- rep(1/g, g) # the initial values of the mixing proportions

162 initobj1$mu <-t(meaninit) initobj1$sigma <- sigmainit obj3 <- EMMIX(dat, g, distr, ncov, init = initobj1, itmax = 1000, epsilon = 1e-06)

## ## ------## ## 3 - Component Multivariate Normal Mixture Model ## ## ------## ## $distr ## [1] "mvn" ## ## $error ## [1] 0 ## ## $loglik ## [1] -5150 ## ## $bic ## [1] 10402 ## ## $aic ## [1] 10333 ## ## $pro ## [1] 0.1524 0.4032 0.4444 ##

163 ## $mu ## [,1] [,2] [,3] ## [1,] 20.46 42.36 70.43 ## [2,] 1073.41 1861.97 2808.18 ## ## $sigma ## , , 1 ## ## [,1] [,2] ## [1,] 37.85 2621 ## [2,] 2621.27 226479 ## ## , , 2 ## ## [,1] [,2] ## [1,] 116.4 3866 ## [2,] 3866.0 250825 ## ## , , 3 ## ## [,1] [,2] ## [1,] 324.3 5353 ## [2,] 5352.7 310420 ## ## ## $ICL ## [1] -5284 ## ## ## ------

164 4000 3000 2000 Grain Yield (kg/ha) Yield Grain 1000 0

0 20 40 60 80 100 120 140

Nitrogen Uptake (kg/ha)

Figure F.3: Cluster partition found by the EM algorithm initiated from simulated means. [Observations in different colours have been classified as belonging to different groups. The dots represent the joint means and the ellipses are the 90% prediction regions for each group]. obj3$loglik

## [1] -5150

classcolor(dat, ellipse = "TRUE", obj3, a = 0.1, n = size, g = g)

iv) Subsample solution sequence <- seq(1, size, 1) index <- sample(sequence, size = 200, replace = FALSE) subsample <- dat[index, ] initobjsub <- init.mix(subsample, g, distr, ncov, nkmeans = 0, nrandom = 100, nhclust = TRUE) objsub <- EMMIX(subsample, g, distr, ncov, init = initobjsub, itmax = 10)

## ## ------

165 ## ## 3 - Component Multivariate Normal Mixture Model ## ## ------## ## $distr ## [1] "mvn" ## ## $error ## [1] 1 ## ## $loglik ## [1] -2397 ## ## $bic ## [1] 4885 ## ## $aic ## [1] 4829 ## ## $pro ## [1] 0.3471 0.3572 0.2957 ## ## $mu ## [,1] [,2] [,3] ## [1,] 47.43 72.27 28.24 ## [2,] 2020.94 2880.57 1359.73 ## ## $sigma ## , , 1

166 ## ## [,1] [,2] ## [1,] 171.2 4978 ## [2,] 4977.6 284157 ## ## , , 2 ## ## [,1] [,2] ## [1,] 353.6 4516 ## [2,] 4516.3 309734 ## ## , , 3 ## ## [,1] [,2] ## [1,] 105 4747 ## [2,] 4747 312637 ## ## ## $ICL ## [1] -2500 ## ## ## ------initobj4 <- objsub obj4 <- EMMIX(dat, g, distr, ncov, init = initobj4, itmax = 1000, epsilon = 1e-06)

## ## ------##

167 ## 3 - Component Multivariate Normal Mixture Model ## ## ------## ## $distr ## [1] "mvn" ## ## $error ## [1] 0 ## ## $loglik ## [1] -5150 ## ## $bic ## [1] 10402 ## ## $aic ## [1] 10333 ## ## $pro ## [1] 0.4059 0.4417 0.1524 ## ## $mu ## [,1] [,2] [,3] ## [1,] 42.41 70.54 20.46 ## [2,] 1864.94 2811.11 1073.26 ## ## $sigma ## , , 1 ##

168 ## [,1] [,2] ## [1,] 117.1 3887 ## [2,] 3887.5 252103 ## ## , , 2 ## ## [,1] [,2] ## [1,] 323.3 5313 ## [2,] 5313.0 309438 ## ## , , 3 ## ## [,1] [,2] ## [1,] 37.87 2621 ## [2,] 2621.16 226382 ## ## ## $ICL ## [1] -5284 ## ## ## ------obj4$loglik

## [1] -5150 classcolor(dat, ellipse = "TRUE", obj4, a = 0.1, n = size, g = g)

v) Short runs

169 4000 3000 2000 Grain Yield (kg/ha) Yield Grain 1000 0

0 20 40 60 80 100 120 140

Nitrogen Uptake (kg/ha)

Figure F.4: Cluster partition found by the EM algorithm initiated from the mixture estimates obtained by running the EM algorithm on a random subsample of 200 ob- servations. [Observations in different colours have been classified as belonging to different groups. The dots represent the joint means and the ellipses are the 90% prediction regions for each group]. initobjshort1 <- init.mix(dat, g, distr, ncov, nkmeans = 0, nrandom = 100, nhclust = FALSE) objshort1 <- EMMIX(dat, g, distr, ncov, init = initobjshort1, epsilon = 0.01)

## ## ------## ## 3 - Component Multivariate Normal Mixture Model ## ## ------## ## $distr ## [1] "mvn" ## ## $error ## [1] 0

170 ## ## $loglik ## [1] -5160 ## ## $bic ## [1] 10424 ## ## $aic ## [1] 10355 ## ## $pro ## [1] 0.3109 0.3822 0.3069 ## ## $mu ## [,1] [,2] [,3] ## [1,] 72.46 34.16 51.85 ## [2,] 2803.79 1765.15 2006.83 ## ## $sigma ## , , 1 ## ## [,1] [,2] ## [1,] 371.9 6341 ## [2,] 6341.4 363120 ## ## , , 2 ## ## [,1] [,2] ## [1,] 190.3 9673 ## [2,] 9673.0 567372

171 ## ## , , 3 ## ## [,1] [,2] ## [1,] 314.9 11613 ## [2,] 11613.4 490879 ## ## ## $ICL ## [1] -5351 ## ## ## ------initobjshort2 <- init.mix(dat, g, distr, ncov, nkmeans = 0, nrandom = 100, nhclust = FALSE) objshort2 <- EMMIX(dat, g, distr, ncov, init = initobjshort2, epsilon = 0.01)

## ## ------## ## 3 - Component Multivariate Normal Mixture Model ## ## ------## ## $distr ## [1] "mvn" ## ## $error ## [1] 0

172 ## ## $loglik ## [1] -5160 ## ## $bic ## [1] 10424 ## ## $aic ## [1] 10355 ## ## $pro ## [1] 0.3071 0.3878 0.3051 ## ## $mu ## [,1] [,2] [,3] ## [1,] 52.29 34.33 72.52 ## [2,] 2020.33 1771.45 2801.91 ## ## $sigma ## , , 1 ## ## [,1] [,2] ## [1,] 321.3 11862 ## [2,] 11862.2 499889 ## ## , , 2 ## ## [,1] [,2] ## [1,] 192.2 9737 ## [2,] 9737.0 569729

173 ## ## , , 3 ## ## [,1] [,2] ## [1,] 374 6359 ## [2,] 6359 364909 ## ## ## $ICL ## [1] -5351 ## ## ## ------initobjshort3 <- init.mix(dat, g, distr, ncov, nkmeans = 0, nrandom = 100, nhclust = FALSE) objshort3 <- EMMIX(dat, g, distr, ncov, init = initobjshort3, epsilon = 0.01)

## ## ------## ## 3 - Component Multivariate Normal Mixture Model ## ## ------## ## $distr ## [1] "mvn" ## ## $error ## [1] 0

174 ## ## $loglik ## [1] -5160 ## ## $bic ## [1] 10424 ## ## $aic ## [1] 10355 ## ## $pro ## [1] 0.3025 0.3099 0.3877 ## ## $mu ## [,1] [,2] [,3] ## [1,] 72.6 52.32 34.37 ## [2,] 2803.0 2022.39 1774.12 ## ## $sigma ## , , 1 ## ## [,1] [,2] ## [1,] 373.9 6338 ## [2,] 6338.4 364670 ## ## , , 2 ## ## [,1] [,2] ## [1,] 323.3 11922 ## [2,] 11921.7 501949

175 ## ## , , 3 ## ## [,1] [,2] ## [1,] 192.7 9768 ## [2,] 9767.7 571508 ## ## ## $ICL ## [1] -5351 ## ## ## ------objshort1$loglik

## [1] -5160 objshort2$loglik

## [1] -5160 objshort3$loglik

## [1] -5160 max(objshort1$loglik, objshort3$loglik, objshort3$loglik)

## [1] -5160

If objshort3 is the model which provides the solution with the highest value of the log likelihood function among the short runs of the algorithm; then, use this solution to initiate a complete run of the algorithm.

176 objlong <- EMMIX(dat, g, distr, ncov, init = objshort3, itmax = 100, epsilon = 1e-06)

## ## ------## ## 3 - Component Multivariate Normal Mixture Model ## ## ------## ## $distr ## [1] "mvn" ## ## $error ## [1] 0 ## ## $loglik ## [1] -5160 ## ## $bic ## [1] 10424 ## ## $aic ## [1] 10355 ## ## $pro ## [1] 0.3189 0.3037 0.3774 ## ## $mu ## [,1] [,2] [,3] ## [1,] 72.42 51.41 33.87

177 ## [2,] 2808.44 1993.19 1752.20 ## ## $sigma ## , , 1 ## ## [,1] [,2] ## [1,] 368.2 6280 ## [2,] 6279.6 359576 ## ## , , 2 ## ## [,1] [,2] ## [1,] 301.7 11117 ## [2,] 11117.0 473221 ## ## , , 3 ## ## [,1] [,2] ## [1,] 187.4 9556 ## [2,] 9556.0 562705 ## ## ## $ICL ## [1] -5349 ## ## ## ------objlong$loglik

## [1] -5160

178 4000 3000 2000 Grain Yield (kg/ha) Yield Grain 1000 0

0 20 40 60 80 100 120 140

Nitrogen Uptake (kg/ha)

Figure F.5: Cluster partition found by the EM algorithm initiated from the mixture estimates after running several short runs of the EM algorithm. [Observations in different colours have been classified as belonging to different groups. The dots represent the joint means and the ellipses are the 90% prediction regions for each group]. classcolor(dat, ellipse = "TRUE", objlong, a = 0.1, n = size, g = g) obj1$loglik

## [1] -5160 obj2$loglik

## [1] -5150 obj3$loglik

## [1] -5150 obj4$loglik

## [1] -5150 objlong$loglik

## [1] -5160

179 Delete spurious solutions. From the remaining solutions, select the one with the highest value of the log likelihood function. If there are not spuriosities, proceed as follows: max(obj1$loglik, obj2$loglik, obj3$loglik, obj4$loglik, objlong$loglik)

## [1] -5150

If obj2 is the solution with the highest value of the log likelihood function, record its AIC and BIC value. obj2$aic

## [1] 10333 obj2$ICL

## [1] -5284 obj2$bic

## [1] 10402

This procedure is repeated starting with the maximum number of groups until fitting just one group. Finally, the model selected is the one which minimises the information criteria.

180 Bibliography

H. Akaike. Information theory and an extension of the maximum likelihood principle. In B. N. Petrov and F. Csaki, editors, Second International Symposium on Information Theory., pages 267–281. Budapest: Akademia Kiado, 1973.

H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19:716–723, 1974.

R. Albrizio, M. Todorovic, T. Matic, and A. M. Stellacci. Comparing the interactive effects of water and nitrogen on durum wheat and barley grown in a Mediterranean environment. Field Crops Research, 115:179–190, 2010.

T. W. Anderson and I. Olkin. Maximum-likelihood estimation of the parameters of a multivariate normal distribution. Linear algebra and its applications, 70:147–171, 1985.

M. Andrews, P. J. Lea, J.A. Raven, and R.A. Azevedo. Nitrogen use efficiency. 3. Nitrogen fixation: genes and costs. Annals of Applied Biology, 155:1–13, 2009.

Australian Centre for Plant Functional Genomics. ACPFG Web site. http:// www.acpfg.com.au/search.php?q=Nitrogen%20Use%20Efficiency. Last accessed: 2014-06-12.

J. D. Banfield and A. E. Raftery. Model-based Gaussian and non-Gaussian clustering. Biometrics, 49:803–821, 1993.

T. Benaglia, D. Chauveau, D. R. Hunter, and D. S. Young. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32:1–29, 2009.

181 H. Bensmail, G. Celeux, A. E. Raftery, and C. P. Robert. Inference in model-based cluster analysis. Statistics and Computing, 7:1–10, 1997.

C. Biernacki, G. Celeux, and G. Govaert. Assessing a mixture model for clustering with the Integrated Completed Likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:719–725, 2000.

C. Biernacki, G. Celeux, and G. Govaert. Choosing starting values for the EM algo- rithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis, 41:561–575, 2003.

C. Biernacki, G. Celeux, G. Govaert, and F. Langrognet. Model-based cluster and discriminant analysis with the MIXMOD software. Computational Statistics & Data Analysis, 51:587–600, 2006.

S. Borman. Topics in multiframe superresolution restoration. PhD thesis, the Univer- sity of Notre Dame, 2004.

A.F. Bouwman, G. Van Drecht, and K.W. Van der Hoek. Global and regional surface nitrogen balances in intensive agricultural production systems for the period 1970- 2030. Pedosphere, 15:137–155, 2005.

H. Bozdogan. On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statistics-Theory and Methods, 19:221–278, 1990.

H. Bozdogan. Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-Fisher information matrix. In O. Opitz, B. Lausen, and R. Klar, editors, Information and classification, pages 40–54. Heidelberg: Springer, 1993.

G. Casella and R. L. Berger. Statistical inference. Pacific Grove, CA: Duxbury, 2nd edition, 2002.

K. G. Cassman, A. Dobermann, and D. T. Walters. Agroecosystems, nitrogen-use efficiency, and nitrogen management. Ambio, 31:132–140, 2002.

182 K.G. Cassman, S. Peng, D. C. Olk, J.K. Ladha, W. Reichardt, A. Dobermann, and U. Singh. Opportunities for increased nitrogen-use efficiency from improved resource management in irrigated rice systems. Field Crops Research, 56:7–39, 1998.

G. Celeux and G. Soromenho. An entropy criterion for assessing the number of clusters in a mixture model. Journal of classification, 13:195–212, 1996.

V. Chew. Confidence, prediction, and tolerance regions for the multivariate normal distribution. Journal of the American Statistical Association, 61:605–617, 1966.

G. W. Cochran. Sampling Techniques. New York: Wiley, 1977.

H. Cramer. Mathematical Methods of Statistics. Princeton: Princeton University Press, 1946.

J. Crossa and J. Franco. Statistical methods for classifying genotypes. Euphytica, 137: 19–37, 2004.

A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39:1–38, 1977.

M. Di Zio, U. Guarnera, and O. Luzi. Editing systematic unity measure errors through mixture modelling. Survey Methodology, 31:53–63, 2005.

M. Di Zio, U. Guarnera, and R. Rocci. A mixture of mixture models for a classification problem: The unity measure error. Computational statistics & data analysis, 51: 2573–2585, 2007.

J. Diebolt and C. P. Robert. Estimation of finite mixture distributions through bayesian sampling. Journal of the Royal Statistical Society. Series B (Methodological), 56:363– 375, 1994.

A. Dobermann and K.G. Cassman. Plant nutrient management for enhanced produc- tivity in intensive grain production systems of the United States and Asia. Plant and Soil, 247:153–175, 2002.

183 A. R. Dobermann. Nitrogen use efficiency–state of the art. In IFA International Workshop on Enhanced Efficiency Fertilizers. Frankfurt, 2005.

R.E. Evenson and D. Gollin. Assessing the impact of the green revolution, 1960 to 2000. Science, 300:758–762, 2003.

B. S. Everitt. An introduction to finite mixture distributions. Statistical Methods in Medical Research, 5:107–127, 1996.

B. S. Everitt, S. Landau, M. Leese, and D. Stahl. Cluster analysis. Chicester: Wiley, 5th edition, 2011.

Microsoft Excel. Microsoft Excel. Computer Software. Microsoft Corporation, Red- mond, Washington, 2010.

N.K. Fageria and V.C. Baligar. Enhancing nitrogen use efficiency in crop plants. Ad- vances in agronomy, 88:97–185, 2005.

FAO. Global agriculture towards 2050. In High Level Expert Forum. How to feed the world 2005. Rome: Food and Agriculture Organization of the United Nations, 2009.

E. C. Fieller. The distribution of the index in a normal bivariate population. Biometrika, 24:428–440, 1932.

E. C. Fieller. Some problems in interval estimation. Journal of the Royal Statistical Society. Series B (Methodological), 16:175–185, 1954.

M.A.T. Figueiredo and A.K. Jain. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:381–396, 2002.

R. F. Follett. Innovative 15n microplot research techniques to study nitrogen use efficiency under different ecosystems. Communications in soil science and plant analysis, 32:951–979, 2001.

J.R.S. Fonseca and M.G.M.S. Cardoso. Mixture-model cluster analysis using informa- tion theoretical criteria. Intelligent Data Analysis, 11:155–173, 2007.

184 M.J. Foulkes, M.J. Hawkesford, P.B. Barraclough, M.J. Holdsworth, S. Kerr, S. Kight- ley, and P.R. Shewry. Identifying traits to improve the nitrogen economy of wheat: Recent advances and future prospects. Field Crops Research, 114:329–342, 2009.

D.B. Fowler. Crop nitrogen demand and grain protein concentration of spring and winter wheat. Agronomy Journal, 95:260–265, 2003.

C. Fraley and A. E. Raftery. How many clusters? Which clustering method? Answers via model-based cluster analysis. The computer journal, 41:578–588, 1998.

C. Fraley and A. E. Raftery. MCLUST: Software for model-based cluster analysis. Journal of Classification, 16:297–306, 1999.

V. H. Franz. Ratios: A short guide to confidence limits and proper use. arXiv preprint arXiv:0710.2024, 2007.

M. Friendly. Data ellipses, HE plots and reduced-rank displays for multivariate linear models: SAS software and examples. Journal of Statistical Software, 17:1–43, 2006.

S. Fr¨uhwirth-Schnatter. Finite mixture and Markov switching models. Springer: New York, 2006.

S. Fr¨uhwirth-Schnatter and S. Pyne. Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics, 11:317–336, 2010.

J. N. Galloway, J. D. Aber, J. W. Erisman, S. P. Seitzinger, R. W. Howarth, E. B. Cowling, and B. J. Cosby. The nitrogen cascade. Bioscience, 53:341–356, 2003.

A. Ganesalingam, A. B. Smith, C. P. Beeck, W. A. Cowling, R. Thompson, and B. R. Cullis. A bivariate mixed model approach for the analysis of plant survival data. Euphytica, 190:371–383, 2013.

R. C. Geary. The frecuency distribution of the quotient of two normal variates. Journal of the Royal Statistical Society, 93:442–446, 1930.

185 S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 6:721–741, 1984.

E. Genge. A latent class analysis of the public attitude towards the euro adoption in Poland. Adv. Data Anal. Classif. (in press), 2013.

J. K. Ghosh and P. K. Sen. On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, pages 789–806. Monterey: Wadsworth, 1985.

D. Giambalvo, P. Ruisi, G. Di Miceli, A. S. Frenda, and G. Amato. Nitrogen use efficiency and nitrogen fertilizer recovery of durum wheat genotypes as affected by interspecific competition. Agronomy Journal, 102:707–715, 2010.

S. S. Goyal and R. C. Huffaker. Nitrogen in crop production. In R.D. Hauck, editor, Nitrogen toxicity in plants. Madison: American Society of Agronomy- Crop Science Society of America- Soil Science Society of America, 1984.

R. J. Hathaway. A constrained formulation of maximum-likelihood estimation for normal mixture distributions. The Annals of Statistics, 13:795–800, 1985.

M. J. Hawkesford. Reducing the reliance on nitrogen fertilizer for wheat production. Journal of Cereal Science, 59:276–283, 2014.

D. V. Hinkley. On the ratio of two correlated normal random variables. Biometrika, 56:635–639, 1969.

B. Hirel, J. Le Gouis, B. Ney, and A. Gallais. The challenge of improving nitrogen use efficiency in crop plants: towards a more central role for genetic variability and quantitative genetics within integrated approaches. Journal of Experimental Botany, 58:2369–2387, 2007.

186 S. Ingrassia and R. Rocci. Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Computational Statistics & Data Analysis, 51:5339–5351, 2007.

IRRI. IRRISTAT User’s Manual Version 3. 1994.

M. Ishiguro, Y. Sakamoto, and G. Kitagawa. Bootstrapping log likelihood and EIC, an extension of AIC. Annals of the Institute of Statistical Mathematics, 49:411–434, 1997.

B.H. Janssen, F.C.T. Guiking, D. Van der Eijk, E.M.A. Smaling, J. Wolf, and H. Van Reuler. A system for quantitative evaluation of the fertility of tropical soils (QUEFTS). Geoderma, 46:299–318, 1990.

A Jasra, C. C. Holmes, and D. A. Stephens. Markov chain Monte Carlo methods and the label switching problem in bayesian mixture modeling. Statistical Science, 20: 50–67, 2005.

D. Karlis and E. Xekalaki. Choosing initial values for the EM algorithm for finite mixtures. Computational Statistics & Data Analysis, 41:577–590, 2003.

J. Kiefer and J. Wolfowitz. Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. The Annals of Mathematical Statistics, 27:887–906, 1956.

J. K. Ladha, H. Pathak, T. J. Krupnik, J. Six, and C. van Kessel. Efficiency of fertilizer nitrogen in cereal production: retrospects and prospects. Advances in Agronomy, 87:85–156, 2005.

C. D. Lai, G. R. Wood, and C. G. Qiao. The mean of the inverse of a punctured normal distribution and its application. Biometrical Journal, 46:420–429, 2004.

K. Lee, J. M. Marin, K. Mengersen, and C. Robert. In Proceedings of the Platinum Jubilee of the Indian Statistical Institute, chapter Bayesian inference on mixtures of distributions. Bangalore: Indian Statistical Institute, 2008.

187 P. M. Lee. Bayesian statistics: an introduction. Hoboken: Wiley, 4th edition, 2012.

P. J. Lenk and W. S. DeSarbo. Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika, 65:93–119, 2000.

Bruce G Lindsay. Mixture models: theory, geometry and applications. In NSF-CBMS regional conference series in probability and statistics, pages 1–163. Hayward: Insti- tute of Mathematical Statistics- Alexandria: American Statistical Association, 1995.

M. Liu, Z. Yu, Y. Liu, and N.T. Konijn. Fertilizer requirements for wheat and maize in China: The QUEFTS approach. Nutrient Cycling in Agroecosystems, 74:245–258, 2006.

X. Liu, P. He, J. Jin, W. Zhou, G. Sulewski, and S. Phillips. Yield gaps, indigenous nutrient supply, and nutrient use efficiency of wheat in China. Agronomy Journal, 103:1452–1463, 2011.

R. Maitra. Initializing partition-optimization algorithms. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6:144–157, 2009.

J. M. Marin, K. Mengersen, and C. P. Robert. Bayesian modelling and inference on mixtures of distributions. In C. Rao and D. Dey, editors, Handbook of Statistics, volume 25, pages 459–507. Elsevier: Amsterdan, 2005.

J. S. Marron and M. P. Wand. Exact mean integrated squared error. The Annals of Statistics, 20:712–736, 1992.

G. Marsaglia. Ratios of normal variables and ratios of sums of uniform variables. Journal of the American Statistical Association, 60:193–204, 1965.

G. Marsaglia. Ratios of normal variables. Journal of Statistical Software, 16:1–10, 2006.

H. Marschner and P. Marschner. Marschner’s mineral nutrition of higher plants. Lon- don: Elsevier, 2012.

188 G. J. McLachlan. On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture. Journal of the Royal Statistical Society. Series C. (Applied Statistics), 36:318–324, 1987.

G. J. McLachlan and D. Peel. Finite mixture models. New York: Wiley, 2000.

G. J. McLachlan, D. Peel, K. E. Basford, and P. Adams. The EMMIX software for the fitting of mixtures of normal and t-components. Journal of Statistical Software, 4, 1999.

G.J. McLachlan, D. Peel, and W.J. Whiten. Maximum likelihood clustering via normal mixture models. Signal Processing: Image Communication, 8:105–111, 1996.

M. Meila and D. Heckerman. An experimental comparison of model-based clustering methods. Machine Learning, 42:9–29, 2001.

V. Melnykov. Challenges in model-based clustering. Wiley Interdisciplinary Reviews: Computational Statistics, 5:135–148, 2013.

V. Melnykov and R. Maitra. Finite mixture models and model-based clustering. Statis- tics Surveys, 4:80–116, 2010.

X.L. Meng and D. Van Dyk. The EM algorithm - an old folk-song sung to a fast new tune. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 59: 511–567, 1997.

D. Murdoch, E.D. Chow, and J.M. Frias Celayeta. ellipse: Functions for drawing ellipses and ellipse-like confidence regions. R package version 0.3-5, 2007.

K. Naklang, D. Harnpichitvitaya, S.T. Amarante, L.J. Wade, and S.M. Haefele. In- ternal efficiency, nutrient uptake, and the relation to field water resources in rainfed lowland rice of northeast Thailand. Plant and Soil, 286:193–208, 2006.

S. Newcomb. A generalized theory of the combination of observations so as to obtain the best result. American Journal of Mathematics, 8:343–366, 1886.

189 S. Ng. Recent developments in expectation-maximization methods for analyzing com- plex data. Wiley Interdisciplinary Reviews: Computational Statistics, 5:415–431, 2013.

C. Nicholson. The probability integral for two variables. Biometrika, 33:59–72, 1943.

O.E. Olarewaju, M.T. Adetunji, C.O. Adeofun, and I.M. Adekunle. Nitrate and phos- phorus loss from agricultural land: implications for nonpoint pollution. Nutrient Cycling in Agroecosystems, 85:79–85, 2009.

M.G.H. Omran, A. P. Engelbrecht, and A. Salman. An overview of clustering methods. Intelligent Data Analysis, 11:583–605, 2007.

B.N. Otteson, M. Mergoum, and J.K. Ransom. Seeding rate and nitrogen management effects on spring wheat yield and yield components. Agronomy Journal, 99:1615– 1621, 2007.

K. Pearson. Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A, 185:71–110, 1894.

D. Peel and G. J. McLachlan. Robust mixture modelling using the t distribution. Statistics and computing, 10:339–348, 2000.

D. Pe˜na. An´alisisde datos multivariantes. McGraw-Hill: Madrid, 2002.

T. Pham-Gia, N. Turkkan, and E. Marchand. Density of the ratio of two normal random variables and applications. Communications in StatisticsTheory and Methods, 35: 1569–1591, 2006.

C. G. Qiao, G. R. Wood, C. D. Lai, and D. W. Luo. Comparison of two common estimators of the ratio of the means of independent normal variables in agricultural research. Journal of Applied Mathematics and Decision Sciences, 2006:1–14, 2006.

R Core Team. R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna, Austria, 2012. URL http://www. R-project.org/. ISBN 3-900051-07-0.

190 W. R. Raun and G. V. Johnson. Improving nitrogen use efficiency for cereal production. Agronomy Journal, 91:357–363, 1999.

R. A. Redner and H. F. Walker. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26:195–239, 1984.

A. C. Rencher. Multivariate statistical inference and applications. New York: Wiley, 1998.

C. P. Robert and G. Casella. Monte Carlo statistical methods. New York: Springer, 2nd edition, 2004.

M. L. Samuels, J. A. Witmer, and A Schaffner. Statistics for the life sciences. Boston: Pearson Education, 2012.

SAS Institute. SAS Institute version 9.4. 2013.

G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6:461–464, 1978.

W. Seidel, K. Mosler, and M. Alker. A cautionary note on likelihood ratio tests in mixture models. Annals of the Institute of Statistical Mathematics, 52:481–487, 2000.

B. Seo and D. Kim. Root selection in normal mixture models. Computational Statistics & Data Analysis, 56:2454–2470, 2012.

S. Shanmugalingam. On the analysis of the ratio of two correlated normal variables. Journal of the Royal Statistical Society. Series D (The Statistician), 31:251–258, 1982.

T. R. Sinclair. Historical changes in harvest index and crop nitrogen accumulation. Crop Science, 38:638–643, 1998.

V. Smil. Nitrogen in crop production: An account of global flows. Global Biogeochemical Cycles, 13:647–662, 1999.

P. Smyth. Model selection for probabilistic clustering using cross-validated likelihood. Statistics and Computing, 9:63–72, 2000.

191 J. A. Snyman. Practical mathematical optimization: an introduction to basic opti- mization theory and classical and new gradient-based algorithms. Boston: Springer, 2005.

J.H.J. Spiertz. Nitrogen, sustainable agriculture and food security. A review. Agronomy for Sustainable Development, 30:43–55, 2010.

SPSS. Systat user’s guide: Statistics, version 7.0. spss. Inc., Chicago, IL, 1997.

M. Stephens. Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62:795–809, 2000.

S. Takahashi, M. R. Anwar, and S. G de Vera. Effects of compost and nitrogen fertilizer on wheat nitrogen use in Japanese soils. Agronomy Journal, 99:1151–1157, 2007.

C. T´etard-Jones,P. N. Shotton, L. Rempelos, J. Cooper, M. Eyre, C. H. Orr, C. Leifert, and A.M.R. Gatehouse. Quantitative proteomics to study the response of wheat to contrasting fertilisation regimes. Molecular Breeding, 31:379–393, 2013.

D. Tilman, K. G. Cassman, P. A. Matson, R. Naylor, and S. Polasky. Agricultural sustainability and intensive production practices. Nature, 418:671–677, 2002.

D. M. Titterington, A.F.M. Smith, and U.E. Makov. Statistical analysis of finite mixture distributions. New York: Wiley, 1985.

UN. Word population prospects: The 2012 revision, highlights and advance tables. United Nations, 2013.

K.K. Vinod and S. Heuer. Approaches towards nitrogen-and phosphorus-efficient rice. AoB Plants, pls 028, 2012.

U. Von Luxburg and V. H. Franz. Confidence sets for ratios: a purely geometric approach to Fieller’s theorem. Technical Report TR-133, Max Planck Institute for Biological Cybernetics, Giessen, Germany, 2004.

192 U. Von Luxburg and V. H. Franz. A geometric approach to confidence sets for ratios: Fieller’s theorem, generalizations, and bootstrap. Statistica Sinica, 19:1095–1117, 2009.

VSN International. Genstat for Windows 15th Edition. VSN International, Hemel Hempstead UK. 2012. URL http://www.vsni.co.uk/.

D. D. Wackerly, W. Mendenhall, and R. L. Scheaffer. Mathematical statistics with applications. Belmont:Duxbury, 5th edition, 1996.

Waite Institute. Waite Research Institute Web site. http:// waiteresearchinstitute.wordpress.com/tag/use-efficiency/, a. Last accessed: 2014-06-12.

Waite Institute. Waite Research Institute Web site. https://waiteresearchinstitute.wordpress.com/tag/ australian-centre-for-plant-functional-genomics/, b. Last accessed: 2014-06-12.

S. D. Walter, A. Gafni, and S. Birch. A geometric confidence ellipse approach to the estimation of the ratio of two variables. Statistics in Medicine, 27:5956–5974, 2008.

S.S. Wilks. The large-sample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics, 9:60–62, 1938.

A. Willse and R. J. Boik. Identifiable finite mixtures of location models for clustering mixed-mode data. Statistics and Computing, 9:111–121, 1999.

C. Witt, A Dobermann, S. Abdulrachman, H.C. Gines, W. Guanghuo, R. Nagarajan, S. Satawatananont, T. Thuc Son, P. Sy Tan, L. Van Tiem, and D. C. Olk. Internal nutrient efficiencies of irrigated lowland rice in tropical and subtropical Asia. Field Crops Research, 63:113–138, 1999.

C. F. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, 11:95–103, 1983.

193 G. Xu, X. Fan, and A. J. Miller. Plant nitrogen assimilation and use efficiency. Annual review of plant biology, 63:153–182, 2012.

L. Xu, T. Hanson, E. J. Bedrick, and C. Restrepo. Hypothesis tests on mixture model components with applications in ecology and agriculture. Journal of Agricultural, Biological, and Environmental statistics, 15:308–326, 2010.

194