Categorical Data

Categorical Data Santiago Barreda LSA Summer Institute 2019 Normally-distributed data • We have been modelling data with normally-distributed residuals. • In other words: the data is normally- distributed around the mean predicted by the model. Predicting Position by Height • We will invert the questions we’ve been considering. • Can we predict position from player characteristics? Using an OLS Regression Using an OLS Regression Using an OLS Regression • Not bad but we can do better. In particular: • There are hard bounds on our outcome variable. • The boundaries affect possible errors: i.e., the 1 position cannot be underestimated but the 2 can. • There is nothing ‘between’ the outcomes. • There are specialized models for data with these sorts of characteristics. The Generalized Linear Model • We can break up a regression model into three components: • The systematic component. • The random component. • The link function. 푦 = 푎 + 훽 ∗ 푥 + 푒 The Systematic Component • This is our regression equation. • It specifies a deterministic relationship between the predictors and the predicted value. • In the absence of noise and with the correct model, we would expect perfect prediction. μ = 푎 + 훽 ∗ 푥 Predicted value. Not the observation. The Random Component • Unpredictable variation conditional on the fitted value. • This specifies the nature of that variation. • The fitted value is the mean parameter of a probability distribution. μ = 푎 + 훽 ∗ 푥 푦~푁표푟푚푎푙(휇, 휎2) 푦~퐵푒푟푛표푢푙푙(휇) Bernoulli Distribution • Unlike the normal, this distribution generates only values of 1 and 0. • It has only a single parameter, which must be between 0 and 1. • The parameter is the probability of observing an outcome of 1 (or 1-P of 0). 푦~퐵푒푟푛표푢푙푙(휇) [1,0,1,0,0,0,1,0,1,1] The Link Function • We are modeling lines. This means our predicted values: • Range from positive to negative infinity. • Feature a consistent change (slope) across the range. • These characteristics are not compatible with the mean parameter of some distributions (e.g., Bernoulli). The Logit Link Function • Maps the entire number line to a range between 0 and 1. • The logistic function (logit) maps probabilities to logits, while its inverse (ilogit, logistic) maps logits to probabilities. log(푝) exp(푧) 푧 = 푝 = log(1 − 푝) 푧 1 + exp(푧) p Logit logit(푝) Probability Probability ilogit(푧) Can we Distinguish SF and SG? • We predict the logit of the probability that a player is a small forward as a function of their height. • This is then turned into a probability and used in a Bernouilli distribution. 푙표푔푡(푎 + 퐻 ∗ ℎ푒푔ℎ푡) 푎 + 퐻 ∗ ℎ푒푔ℎ푡 (Fitted value) value) (Fitted Fitted value value Fitted ilogit Can we Distinguish PF and SG? SF SG SF SF Probability Logits o GLM estimates Bayesian Logistic Regression Vector containing only 1 and 0 Likelihood 푦~퐵푒푟푛표푢푙푙(휃) Systematic component 휃 = ilogit(휇) Link function 휇 = 푎0 + 퐻 ∗ ℎ푒푔ℎ푡 Random component Priors 2 푎0~푁(0, 1000 ) 퐻~푁(0, 10002) Bayesian Logistic Regression 푎0 퐻 Adding a Third Outcome • Logistic regression compares an outcome (success) to some reference outcome (failure). • Adding a third categorical outcome means we need SF multinomial logistic SG regression now PG (which I’ll call softmax regression like Kruschke). Softmax Regression • In softmax regression, you have a different equation for each outcome category. • The ‘weight’ for each outcome is then put 휇푗 = 푎0푗 + 퐻푗 ∗ ℎ푒푔ℎ푡 into the softmax function, yielding exp(휇푗) 푃푗 = 퐾 outcome probabilities σ푘=1 exp(휇푘) for each category. Softmax Regression • For softmax regression, you set the weight of the reference category to zero. All other categories receive a predicted weight. • The reference PG Reference category is arbitrary. SG SF • I like to pick one in the middle of predictor ranges because it is easier to interpret the models. Softmax Regression • Logistic regression can be thought of as a special case of softmax regression. • In logistic regression, you model a single 휇푗 = 푎0푗 + 퐻푗 ∗ ℎ푒푔ℎ푡 line for the outcome exp(휇푗) 푃푗 = 퐾 called ‘success’, and σ푘=1 exp(휇푘) failure is given a weight of 0. exp(휇푗) exp(휇푗) 푃푗 = = exp(0) + exp(휇푗) 1 + exp(휇푘) Logistic link function Softmax Regression 퐾 exp(휇푗) 푃푗 = 퐾 ln 푃푗 = 휇푗 − ln(෍ exp(휇푘)) σ푘=1 exp(휇푘) 푘=1 SG PG SF Ln(P(SG)) – P (category response) P (category Ln(P (category response)) (category Ln(P Ln(P (category response)) (category Ln(P Bayesian Softmax Regression Vector containing integers >0 and <(j+1) Random component Number of Systematic categories (j). component Link function For response categories 1…j Likelihood Priors 푎 ~푁(0, 10002) 푦~푚푢푙푡푛표푚푎푙(휃푗) exp(휇푗) 0푗 휃푗 = 퐾 σ푘=1 exp(휇푘) 2 휇푗 = 푎0푗 + 퐻푗 ∗ ℎ푒푔ℎ푡 퐻푗~푁(0, 1000 ) Bayesian Softmax Regression We have a new loop, a different 휇 for each category and trial. We exponentiate 휇 but don’t normalize (dcat does it for us). We set up priors for all of the j-1 coefficients. Remembering to set one group to zero! Softmax Regression: Results SG SF SF SG PG PG Results are coefficients specifying a different line for each category. Softmax Regression: Results SG SF PG SG SF PG PG SG Logits Logits Logits SF Bayesian Softmax Regression • The values of the three lines are the fitted values for each of our categories, in log-odds. • We can get the probability of observing each category for trial i by putting the fitted value for each category, for that trial, into the softmax function. 휇푖푗 = 푎0푖푗 + 퐻푖푗 ∗ ℎ푒푔ℎ푡푖 exp(휇푗) exp(푎0푖푗 + 퐻푖푗 ∗ ℎ푒푔ℎ푡푖) 푃푗 = 퐾 푃푗 = 퐾 σ푘=1 exp(휇푘) σ푘=1 exp(푎0푖푗 + 퐻푖푗 ∗ ℎ푒푔ℎ푡푖) Softmax Regression: Results Data PG SG SF Logits P(category selection) P(category Fitted Values P(category selection) P(category selection) P(category Classification • If you look up these and other topics: • Luce choice model. • Bayesian Decision Theory. • Linear discriminant analysis. • You will see the softmax function or something just like it. Classification • This framework is fundamental in research on decision-making and classification. • It can be used with a decision rule: select the category that maximizes P. • When w is a set of posterior probabilities, we have a Bayesian classifier. 푃 푌 퐶푖 푃(퐶푖) 푃(퐶푖|푌) = 퐽 σ푗=1 푃 푌 퐶푗 푃(퐶푗) Ordinal Logistic Regression • Basketball positions are somewhere between ordered and unordered. • They are numbered 1-5, and do correspond to increasing size of the players. • On the other hand they also seem at least somewhat arbitrarily ordered. Ordinal Logistic Regression • We can use softmax regression whether our outcome categories are inherently ordered or not. • But if our categories are inherently ordered, we may prefer ordinal logistic regression. • This method allows us to predict categorical responses when the response categories have an inherent order. Likert Scales • Likert scales involve collecting categorical ordered responses, usually a few integers or phrases connoting degree of opinion (DBDA2, pg. 681). • Categorical responses are meant to reflect an underlying metric variable. • Responses are meant to be averaged, yielding a metric variable that can be analyzed using normal- theory statistics. Likert Scales • The individual responses can be directly modeled using ordinal logistic regression. • Ordinal logistic analyses can provide more information than analyses involving averaging. Ordinal Logistic Regression • Responses are assumed to depend on an underlying continuous value. • Conceptually, a normal distribution is placed a given location along this continuous value 푥 Ordinal Logistic Regression • We estimate the 휇 and 휎 of the distribution. • We also estimate J-1 ‘thresholds’ for J response values. Threshold 2 Threshold 1 Threshold 3 푥 Ordinal Logistic Regression • The probability of observing a response of j is equal to the area under the curve between threshold j-1 and threshold j. P(response==2) P(response==3) P(response==1) P(response==4) 푥 Ordinal Logistic Regression • We can make this into a regression problem by modeling the mean parameter as a function of relevant predictors. Setting up the Data • The only unusual thing is that you have to set up a vector for the threshold parameters. • We will set thresholds 1 and J-1 to 1.5 and J-0.5. • The rest are set to NA so they are estimated by JAGS. Ordinal Logistic Model Likelihood For response possibilities 1…j 푦~푚푢푙푡푛표푚푎푙(휃푗) 휃푗 = Φ 푡ℎ푟푒푠ℎ푗, 휇, 휎 − Φ(푡ℎ푟푒푠ℎ푗−1, 휇, 휎) 휇 = 푎0 + 퐻 ∗ ℎ푒푔ℎ푡 Where 푡ℎ푟푒푠ℎ0 = -∞ and 푡ℎ푟푒푠ℎ퐽 = ∞ Priors 2 푎0~푁(0, 100 ) 퐻~푁(0, 1002) 휎~ℎ푎푙푓퐶푎푢푐ℎ푦(0,10) 푡ℎ푟푒푠ℎ푗~푁(0,4) Ordinal Logistic Model Random component Link function Systematic component Results 푇ℎ푟푒푠ℎ표푙푑푠 푎0 퐻 Results 6’0” 6’4” (note 휇 changes 휇 = 푎0 + 퐻 ∗ ℎ푒푔ℎ푡 but σ is fixed) 6’8” 7’0” Title • text Title • text title • 5 – regression and link functions • 6 – robustness, different shit you can do. Second levels predictors. heteroskedastic variances. • 7- interactions, relating lmer to formulas • 8-errors? Tips for building models. A two-sample t-test • Univariate ‘random effects’ model, explained. • Adding more predictors and interactions for each predictor. (anova- style decomposition) • Decomposition explained (with logistic) • The same for each parameter and notation • Link function explained • NOT multivariate draws • In general I will build up prediction of position from height. • First in bivariate logistic case • The multinimial for 5 classes. • Then ordinal. It also helps with logic of ordinal. Underlying latent variable that causes the ordinal response. • Compare to numbered prediction. .

Categorical Data

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support