# changing the directory
# this is the path that contains the excel file you want to load
setwd('/Users/stp48131/Library/CloudStorage/Dropbox/WKU/Teaching/ECON_307/Class_Materials/Honors/Dummy_Variables')
# this package allows us to load excel files
library(readxl)
# data cleaning and manipulation
library(tidyverse)
# loading the excel file and assigning it to the data frame named data
dummy_data <- read_excel("Dummy_Variables_R.xlsx")
This data contains the average starting salary and major for 50
graduates. We want to use regression to compare he starting salaries for
the different majors. Notice that the “Major” column contains
information about each graduate. We need to create separate dummy
variables if we want to include them in a regression. We can do this
using the ifelse()
function.
The following code adds new columns to the dataframe that are equal to 1 if a student had a particular major.
dummy_data2 <- dummy_data %>%
mutate(acct_major=ifelse(Major=="Acct",1,0),
econ_major=ifelse(Major=="Econ",1,0),
fin_major=ifelse(Major=="Fin",1,0),
mgt_major=ifelse(Major=="Mgt",1,0)
)
The “%>%” is a pipe that tells R to use whatever is on the left side of it and feed it into the function on the right. In words, the code above says to start with the dummy dataframe and mutate that dataframe by adding the variables acct_major, econ_major, fin_major, and mgt_major. Store all of this as a new dataframe called dummy_data2
We can now estimate the following regression: \(St\_Sal = a+b1*acct\_major+b2*econ\_major+b3*fin\_major+\epsilon\)
#Estimate the regression
# St_Sal = a+b1*econ_major+b2*fin_major+b3*mgt+major
reg1 <- lm(St_Sal ~ acct_major+econ_major+fin_major , data = dummy_data2)
# displaying the results
summary(reg1)
##
## Call:
## lm(formula = St_Sal ~ acct_major + econ_major + fin_major, data = dummy_data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6463.9 -1844.1 70.4 1676.1 4186.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25164.7 666.0 37.787 < 2e-16 ***
## acct_major 10753.2 1004.0 10.711 4.38e-14 ***
## econ_major 3646.6 959.7 3.800 0.000424 ***
## fin_major 9314.2 980.3 9.502 2.01e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2492 on 46 degrees of freedom
## Multiple R-squared: 0.7681, Adjusted R-squared: 0.753
## F-statistic: 50.8 on 3 and 46 DF, p-value: 1.22e-14
The estimated regression equation is \(\hat{St\_Sal} = 25164.7+10753.2*acct\_major+3646.6*econ\_major+9314.2*fin\_major\)
Notice the p-values are all less than 0.05. This means that we the average starting salaries for accounting, econ, and finance are statistically different from management with 95% confidence.