# changing the directory 
# this is the path that contains the excel file you want to load
setwd('/Users/stp48131/Library/CloudStorage/Dropbox/WKU/Teaching/ECON_307/Class_Materials/Honors/Dummy_Variables')

# this package allows us to load excel files
library(readxl)
# data cleaning and manipulation
library(tidyverse)

# loading the excel file and assigning it to the data frame named data
dummy_data <- read_excel("Dummy_Variables_R.xlsx")

This data contains the average starting salary and major for 50 graduates. We want to use regression to compare he starting salaries for the different majors. Notice that the “Major” column contains information about each graduate. We need to create separate dummy variables if we want to include them in a regression. We can do this using the ifelse() function.

The following code adds new columns to the dataframe that are equal to 1 if a student had a particular major.

dummy_data2 <- dummy_data %>% 
  mutate(acct_major=ifelse(Major=="Acct",1,0),
         econ_major=ifelse(Major=="Econ",1,0),
         fin_major=ifelse(Major=="Fin",1,0),
         mgt_major=ifelse(Major=="Mgt",1,0)
         )

The “%>%” is a pipe that tells R to use whatever is on the left side of it and feed it into the function on the right. In words, the code above says to start with the dummy dataframe and mutate that dataframe by adding the variables acct_major, econ_major, fin_major, and mgt_major. Store all of this as a new dataframe called dummy_data2

We can now estimate the following regression: \(St\_Sal = a+b1*acct\_major+b2*econ\_major+b3*fin\_major+\epsilon\)

#Estimate the regression
# St_Sal = a+b1*econ_major+b2*fin_major+b3*mgt+major
reg1 <- lm(St_Sal ~ acct_major+econ_major+fin_major , data = dummy_data2)

# displaying the results
summary(reg1)
## 
## Call:
## lm(formula = St_Sal ~ acct_major + econ_major + fin_major, data = dummy_data2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6463.9 -1844.1    70.4  1676.1  4186.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  25164.7      666.0  37.787  < 2e-16 ***
## acct_major   10753.2     1004.0  10.711 4.38e-14 ***
## econ_major    3646.6      959.7   3.800 0.000424 ***
## fin_major     9314.2      980.3   9.502 2.01e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2492 on 46 degrees of freedom
## Multiple R-squared:  0.7681, Adjusted R-squared:  0.753 
## F-statistic:  50.8 on 3 and 46 DF,  p-value: 1.22e-14

The estimated regression equation is \(\hat{St\_Sal} = 25164.7+10753.2*acct\_major+3646.6*econ\_major+9314.2*fin\_major\)

Notice the p-values are all less than 0.05. This means that we the average starting salaries for accounting, econ, and finance are statistically different from management with 95% confidence.