# loading packages
library(mlogit)
library(tidyverse)
library(stargazer)
library(rms)
library(readxl)
# loading data for ordered logit model
ordered<-data.frame(read_excel('ordered_health.xlsx'))
# checking the unique values for health status
unique(ordered$healthstatus)
## [1] "good" "excellent" "fair"
We have to assign numbers to the different levels of health status before we can estimate the ordered logit model. We will let fair=1, good=2, and excellent=3. Order is important so think carefully about how you assign numbers to the responses.
# this is an alterantive to using the ifelse function multiple times
ordered$healthnumber<- as.numeric(
recode(ordered$healthstatus,
'fair'='1',
'good'='2',
'excellent'='3'
)
)
We estimate the orderd logit model using the lrm() function.
# estimating the ordered logit model
# you need to rms package for this
ologit<-lrm(healthnumber~age+logincome+numberdiseases, data = ordered)
# viewing the output and coverting the coefficients to odds ratios
# the coefficients for 'y>=2' and 'y>=3' can be ignored
stargazer(ologit, coef = list(exp(ologit$coefficients)), type='text')
##
## ==========================================
## Dependent variable:
## ---------------------------
## healthnumber
## ------------------------------------------
## y> =2 4.039***
## (0.206)
##
## y> =3 0.386*
## (0.205)
##
## age 0.971***
## (0.002)
##
## logincome 1.328***
## (0.023)
##
## numberdiseases 0.946***
## (0.004)
##
## ------------------------------------------
## Observations 5,574
## R2 0.148
## chi2 740.388*** (df = 3)
## ==========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
We compare the odds ratios to 1. A person is less likely to report good or excellent health as age and the number of diseases increase. A person is more likely to report good or excellent health when their income increases.
This data describes a person’s choice to fish at a pier, on a charter boat, on the beach, or on private land. The variables price and catch describe the price and catch rate of the different options. We call these alternative specific variables. The income variable describes the person making the choice. This is an alternative invariant variable.
# loading the data
FishingWide <- data.frame(read_excel('multinomial.xlsx'))
# viewing the first few rows
head(FishingWide)
## mode price.beach price.pier price.private price.charter catch.beach
## 1 charter 157.930 157.930 157.930 182.930 0.0678
## 2 charter 15.114 15.114 10.534 34.534 0.1049
## 3 private 161.874 161.874 24.334 59.334 0.5333
## 4 pier 15.134 15.134 55.930 84.930 0.0678
## 5 private 106.930 106.930 41.514 71.014 0.0678
## 6 charter 192.474 192.474 28.934 63.934 0.5333
## catch.pier catch.private catch.charter income
## 1 0.0503 0.2601 0.5391 7.083332
## 2 0.0451 0.1574 0.4671 1.250000
## 3 0.4522 0.2413 1.0266 3.750000
## 4 0.0789 0.1643 0.5391 2.083333
## 5 0.0503 0.1082 0.3240 4.583332
## 6 0.4522 0.1665 0.3975 4.583332
This data is in wide form. This means we have one observation per choice. The mlogit function we will use below requires the data to be in long form. Since there are four alternatives, there will be four rows per individual in the long form data. We can reshape the data ising the dfidx() function.
FishingLong <- FishingWide %>%
dfidx(shape='wide', # this tells R the data are currently in wide form
choice='mode', # this is the variable that describes the choice
varying=2:9) # these are the variables that are alternative specific
# viewing the data
head(FishingLong)
## ~~~~~~~
## first 10 observations out of 4728
## ~~~~~~~
## mode income price catch idx
## 1 FALSE 7.083332 157.930 0.0678 1:each
## 2 TRUE 7.083332 182.930 0.5391 1:rter
## 3 FALSE 7.083332 157.930 0.0503 1:pier
## 4 FALSE 7.083332 157.930 0.2601 1:vate
## 5 FALSE 1.250000 15.114 0.1049 2:each
## 6 TRUE 1.250000 34.534 0.4671 2:rter
## 7 FALSE 1.250000 15.114 0.0451 2:pier
## 8 FALSE 1.250000 10.534 0.1574 2:vate
## 9 FALSE 3.750000 161.874 0.5333 3:each
## 10 FALSE 3.750000 59.334 1.0266 3:rter
##
## ~~~ indexes ~~~~
## id1 id2
## 1 1 beach
## 2 1 charter
## 3 1 pier
## 4 1 private
## 5 2 beach
## 6 2 charter
## 7 2 pier
## 8 2 private
## 9 3 beach
## 10 3 charter
## indexes: 1, 2
We now have four rows for each individual. The mode variable now identifies the choice the person made. The id2 variable indices the different choices that were presented.
We estimate the multinomial logit model using the mlogit function. Since we do not include alternative specific variable in the multinomial logit model, we put “1” to the left of the “|”. Alternative invariant variables go to the right of the “|”. We indicate the reference group using the reflevel option. Since we want to use the logistic model we set probit equal to FALSE. If we set this equal to TRUE R would estimate the multinomial probit model.
# estimating the model
mlogit<-mlogit(mode ~ 1 | income,
data=FishingLong,
reflevel="charter",
probit = FALSE)
# displaying the results and showing relative risk ratios.
stargazer(mlogit, coef=list(exp(mlogit$coefficients)), type='text')
##
## ===============================================
## Dependent variable:
## ---------------------------
## mode
## -----------------------------------------------
## (Intercept):beach 0.262
## (0.195)
##
## (Intercept):pier 0.590***
## (0.178)
##
## (Intercept):private 0.548***
## (0.136)
##
## income:beach 1.032***
## (0.042)
##
## income:pier 0.894***
## (0.044)
##
## income:private 1.132***
## (0.028)
##
## -----------------------------------------------
## Observations 1,182
## R2 0.014
## Log Likelihood -1,477.151
## LR Test 41.145*** (df = 6)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Again, we compare the coefficients to 1. Individuals are more likely to fish at the beach or on private land compared to fishing on a charter boat as income increases. Increases in income make an individual less likely to fish at a pier relative to a charter boat.
The conditional logit model is also estiamted using the mlogit function. The only difference between the multinomial and conditional models is that the conditional logit includes alternative specific variables. We add these to the left side of the “|”.
# estimating the model
clogit<-mlogit(mode ~ price+catch | income,
data=FishingLong,
reflevel="charter",
probit = FALSE)
# displaying the results and showing relative risk ratios.
stargazer(clogit, coef=list(exp(clogit$coefficients)), type='text')
##
## ===============================================
## Dependent variable:
## ---------------------------
## mode
## -----------------------------------------------
## (Intercept):beach 0.184
## (0.224)
##
## (Intercept):pier 0.400*
## (0.207)
##
## (Intercept):private 0.311*
## (0.159)
##
## price 0.975***
## (0.002)
##
## catch 1.430***
## (0.110)
##
## income:beach 1.034***
## (0.050)
##
## income:pier 0.910***
## (0.050)
##
## income:private 1.131***
## (0.029)
##
## -----------------------------------------------
## Observations 1,182
## R2 0.189
## Log Likelihood -1,215.138
## LR Test 565.171*** (df = 8)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The coefficients for income are interpreted as they were in the multinomial logit model except we are now holding constant the price and catch rate.
Notice we only have one coefficient for price and catch. This is because these variables describe the alternatives. Individuals are less likely to choose an option when it becomes more expensive and are more likely to choose an option when the catch rate increases.