# loading packages
library(AER)
library(tidyverse)
library(ggplot2)
library(lmtest)
library(plm)
library(stargazer)
# loading data that comes with the AER package
data('Fatalities')

Our goal is to estimate the impact of beer taxes on motor vehicle fatalities. The Fatalities dataset contains state-level motor vehicle fatalities for 48 states over 7 years. The variable “fatal” measures the number of fatalities in each state in each year. Since states with more people will have more accidents, we need to scale the number of fatalities by population. We will use fatalities per 10,000 residents.

# adding fatalities per 10,000 residents to the data frame
Fatalities <- Fatalities %>%
  mutate(fatpop=fatal/(pop/10000))

Since we will be including dummy variables for each state in the following regressions, we need the beer tax to vary over time within states. Any variable that does not vary over time within states will be dropped in a fixed-effects regression. We can plot fatalities against the beer tax and assign a different color to each state to see if we have the variation we need.

# plot with different colors for each state
ggplot(Fatalities,aes(x=beertax, y=fatpop)) +
  geom_point(aes(color=state)) +
  labs(x='Beer Tax',
       y='Fatalities per 10,000 Residents') +
  theme_minimal() +
  theme(legend.position='none') # removing the legend

Notice that the beer tax varies within each group of colors. The beer tax varies more for some states than others but we have the variation we need to estimate the fixed effects regression. If you want to see how the beer tax changes over time, you could assign a different color to each year. I will leave that as an exercise for you.

There are three options for estimating the impact of beer taxes on fatalities when working with panel data.

  1. Ignore the panel nature of the data
    \(Fatalities_{it} =\beta_0+\beta_1BeerTax_{it}+\epsilon_{it}\)
    This is the pooled model.
  2. Include a dummy variable for each entity. This will control for time-invariant entity-specific factors that are correlated with beer taxes and fatalities.
    \(Fatalities_{it} =\beta_0+\beta_1BeerTax_{it}+\alpha_i+\epsilon_{it}\).
    This is the one-way fixed effects model.
  3. Include a dummy variable for each entity and a dummy variable for each year. This will control for time-invariant entity-specific factors that are correlated with beer taxes and fatalities and time-varying factors that effect all entities.
    \(Fatalities_{it} =\beta_0+\beta_1BeerTax_{it}+\alpha_i+\lambda_t+\epsilon_{it}\).
    This is a two-way fixed effects model.

In this example, \(\alpha_i\) could be written: \(\sum_{i=1}^{47}\alpha_iState_i\). \(State_i\) is a dummy variable equal to one if the observation is from state \(i\) and \(\alpha_i\) is the coefficient for this dummy variable. We only included 47 state dummy variables to avoid the dummy variable trap.

\(\lambda_t\) contains a set of dummy variables indicating years. Since we observe fatalities for 7 years, we would include 6 year dummy variables.

We will estimate the panel data regressions using the plm() function in the plm library. The “index” option tells R which variables identify the entity and time period. The “model” option tells R which model we want to estimate: “pooling” includes no fixed effects, “within” is the fixed effects model. We add the “effect=‘twoways’” option when we want to estimate the two-way fixed effects model.

# Pooled model
nofe<- plm(fatpop~beertax, 
           data=Fatalities,
           index=c('state','year'),
           model = 'pooling')

# One-way fixed effects
statefe<- plm(fatpop~beertax, 
           data=Fatalities,
           index=c('state','year'),
           model = 'within') # this is the fixed effects option 

# Two-way fixed effects
stateyearfe<- plm(fatpop~beertax, 
              data=Fatalities,
              index=c('state','year'),
              model = 'within', 
              effect='twoways') # add this option to get state and year fixed effects

# Display the results
stargazer(nofe, statefe, stateyearfe, type='text',
          omit.stat='f')
## 
## ==========================================
##                   Dependent variable:     
##              -----------------------------
##                         fatpop            
##                 (1)       (2)       (3)   
## ------------------------------------------
## beertax      0.365***  -0.656*** -0.640***
##               (0.062)   (0.188)   (0.197) 
##                                           
## Constant     1.853***                     
##               (0.044)                     
##                                           
## ------------------------------------------
## Observations    336       336       336   
## R2             0.093     0.041     0.036  
## Adjusted R2    0.091    -0.120    -0.149  
## ==========================================
## Note:          *p<0.1; **p<0.05; ***p<0.01

We interpret the \(\beta\)’s in the usual way. In column 1, a $1 increase in the beer tax increases fatalities per 10,000 residents by 0.365. The results in column 2 are much different since we accounted for time-invariant entity-specific effects that may be correlated with beer taxes and fatalities. On average, a $1 increase in the beer tax decreases fatalities per 10,000 residents by 0.656. Year fixed effects were included in column 3. On average, a $1 increase in the beer tax decreases fatalities per 10,000 residents by 0.64.

Plotting the fitted values helps visualize what the fixed effects are doing. The pooled model estimates a single slope and intercept and ignores the panel nature of the data. The one-way fixed effects model includes a dummy variable for each state. Visually, this means each state has a separate intercept but we assume the slope is the same for all states.

The red line is the estimated regression line from the OLS regression. All states share a common slope and intercept. The colored line segments are from the one-way fixed effects regression. The slopes of each line segment are the same but the intercept for each firm is different.

Entity fixed effects caused the slope to switch from positive to negative. Recall the one-way fixed effects regression uses variation in the beer tax within a state to identify the \(\beta\)’s. On average, when a state increase their beer tax we observe a decrease in motor vehicle fatalities.

Recall the OLS assumption that the \(\epsilon\)’s are not correlated with each other. This assumption is often violated when working with panel data. Fatalities within a state will be correlated over time and, if not accounted for, will cause the standard errors to be unreliable. We can adjust the standard errors by clustering at the entity level. Clustering accounts for correlation in the error term over time within a state and produces reliable standard errors.

# adjusting the standard errors

# pooled model
nofe_c<-coeftest(nofe,vcovHC(nofe,
              type = 'HC1',
              cluster = 'group' # clusters on the entity
              )) 

# one way fixed effects
statefe_c<-coeftest(statefe,vcovHC(statefe,
              type = 'HC1',
              cluster = 'group' # clusters on the entity
              )) 

# two way fixed effects
stateyearfe_c<-coeftest(stateyearfe,vcovHC(stateyearfe,
              type = 'HC1',
              cluster = 'group' # clusters on the entity
              )) 

# display the results
stargazer(nofe_c, statefe_c, stateyearfe_c, type='text',
          omit.stat='f')
## 
## ======================================
##               Dependent variable:     
##          -----------------------------
##                                       
##             (1)        (2)      (3)   
## --------------------------------------
## beertax   0.365***  -0.656**  -0.640* 
##           (0.119)    (0.289)  (0.350) 
##                                       
## Constant  1.853***                    
##           (0.117)                     
##                                       
## ======================================
## ======================================
## Note:      *p<0.1; **p<0.05; ***p<0.01

Notice all of the standard errors are larger compared to the non-clustered standard errors. We still find a negative and statistically significant relationship between beer taxes and fatalities but the p-values are slightly smaller than they were before clustering.

You may have noticed that the plm() results do not include coefficients for the state and year dummy variables. We can view them using the fixef() function:

# viewing coefficients for the state and year dummy variables
# these represent the intercepts for each time period and entity
fixef(stateyearfe, effect=c('time')) # year
##   1982   1983   1984   1985   1986   1987   1988 
## 3.5114 3.4315 3.4390 3.3874 3.4735 3.4605 3.4596
fixef(stateyearfe) # states
##     al     az     ar     ca     co     ct     de     fl     ga     id     il 
## 3.5114 2.9645 2.8728 2.0262 2.0498 1.6712 2.2271 3.2513 4.0230 2.8624 1.5729 
##     in     ia     ks     ky     la     me     md     ma     mi     mn     ms 
## 2.0712 1.9871 2.3071 2.3166 2.6777 2.4171 1.8273 1.4234 2.0449 1.6349 3.4915 
##     mo     mt     ne     nv     nh     nj     nm     ny     nc     nd     oh 
## 2.2360 3.1716 2.0085 2.9332 2.2724 1.4302 3.9575 1.3485 3.2263 1.9076 1.8566 
##     ok     or     pa     ri     sc     sd     tn     tx     ut     vt     va 
## 2.9778 2.3660 1.7656 1.2696 4.0650 2.5232 2.6567 2.6128 2.3617 2.5610 2.2362 
##     wa     wv     wi     wy 
## 1.8742 2.6336 1.7754 3.3079

We can estimate the one-way or two-way fixed effects model using the lm_robust() function. The coefficients will be identical to the plm() estimates but the standard errors will be different. The other shortcoming is that the lm_robust() results cannot be exported using stargazer().