Following the class notes, here is a visualization of the posterior probability for the estimation of the proportion p from a binomial sample. You can play around with the values of the sample size (N) and the count of 1s (X), watching how the posterior changes.
N <- 5
X <- N * (0.25)
curve(dbeta(x, shape1 = X + 1, shape2 = N - X + 1),
from = 0, to = 1, cex = 2, n = 1000)
abline(v = X / N, col = "red")
You should see that as N grows, the probability distribution begins to move from a uniform distribution (the prior) to a distribution concentrated on the mean.
Let’s create an actual sample of data from a hypothetical binomial distribution, where 25% of data are equal to 1 and 75% of the data are equal to 0.
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
The T-test function, as we know, can give a confidence interval based on the T pivot statistic:
##
## One Sample t-test
##
## data: x
## t = 5.7446, df = 99, p-value = 1.017e-07
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.163648 0.336352
## sample estimates:
## mean of x
## 0.25
We can, alternatively, use our Bayesian approach to find a credible region for the value p:
X <- sum(x)
cint <- c(
qbeta(0.05/2, shape1 = X + 1, shape2 = N - X + 1),
qbeta(1 - 0.05/2, shape1 = X + 1, shape2 = N - X + 1)
)
sprintf("95%% Confidence Interval: p ∈ [%.03f, %.03f]", cint[1], cint[2])
## [1] "95% Confidence Interval: p ∈ [0.176, 0.343]"
Notice that the value is very similar to the one based on the T-test. The biggest advantage is that we have a much less complex way of describing the interpretation of the credible interval.