Inferential Statistics

# setting the directory
setwd('/Users/stp48131/Library/CloudStorage/Dropbox/WKU/Teaching/ECON_307/Class_Materials/Honors/Inferential_Statistics')

# loading the libraries
library(tidyverse)
library(readxl)

# loading the data
data <- read_xlsx('Inferential_Statistics_R.xlsx')

The t.test() function is useful for hypothesis testing and for generating confidence intervals. This function will generate all of the sample statistics necessary for the test and produce the confidence interval for any given level of confidence. This is much more efficient than working through each step manually.

We will generate a 95% confidence interval for the sales price of a home.

results<-t.test(data$SalesPrice,
                conf.level = c(.95)
                )

# extracting the confidence interval
results$conf.int

## [1] 145770.8 183762.5
## attr(,"conf.level")
## [1] 0.95

The 95% confidence interval for the sales price of a house is (145770.8, 183762.5)

We can also use the t.test() function for hypothesis testing. The code below test the following hypothesis: \(H_0: \mu=185,000\), \(H_a: \mu \ne 185,000\)

t.test(data$SalesPrice,
       alternative = 'two.sided',
       mu=185000,
       paired=FALSE,
       var.equal = FALSE,
       conf.level = .95)

## 
##  One Sample t-test
## 
## data:  data$SalesPrice
## t = -2.4562, df = 8, p-value = 0.03955
## alternative hypothesis: true mean is not equal to 185000
## 95 percent confidence interval:
##  145770.8 183762.5
## sample estimates:
## mean of x 
##  164766.7

We reject the null hypothesis because the p-value is less than \(0.05\)

The code below test the following hypothesis: \(H_0: \mu=180,000\), \(H_a: \mu \ne 180,000\)

t.test(data$SalesPrice,
       alternative = 'two.sided',
       mu=180000,
       paired=FALSE,
       var.equal = FALSE,
       conf.level = .95)

## 
##  One Sample t-test
## 
## data:  data$SalesPrice
## t = -1.8493, df = 8, p-value = 0.1016
## alternative hypothesis: true mean is not equal to 180000
## 95 percent confidence interval:
##  145770.8 183762.5
## sample estimates:
## mean of x 
##  164766.7

We fail to reject the null hypothesis because the p-value is greater than 0.05.

The code below test the following hypothesis: \(H_0: \mu\le170,000\), \(H_a: \mu > 170,000\)

t.test(data$SalesPrice,
       alternative = 'greater',
       mu=170000,
       paired=FALSE,
       var.equal = FALSE,
       conf.level = .95)

## 
##  One Sample t-test
## 
## data:  data$SalesPrice
## t = -0.6353, df = 8, p-value = 0.7285
## alternative hypothesis: true mean is greater than 170000
## 95 percent confidence interval:
##  149448.5      Inf
## sample estimates:
## mean of x 
##  164766.7

We fail to reject the null hypothesis because the p-value is greater than 0.05.

The code below test the following hypothesis: \(H_0: \mu\ge190,000\), \(H_a: \mu < 190,000\)

t.test(data$SalesPrice,
       alternative = 'less',
       mu=190000,
       paired=FALSE,
       var.equal = FALSE,
       conf.level = .95)

## 
##  One Sample t-test
## 
## data:  data$SalesPrice
## t = -3.0632, df = 8, p-value = 0.007754
## alternative hypothesis: true mean is less than 190000
## 95 percent confidence interval:
##      -Inf 180084.8
## sample estimates:
## mean of x 
##  164766.7

We reject the nul hypothesis because the p-value is less than 0.05.