1. Simulating One Fair Die

Let’s do in R what we did manually, roll a die 160 times:

n <- 16 * 10
y <- sample(seq_len(6), size = n, replace = TRUE)
y

##   [1] 1 4 1 2 5 3 6 2 3 3 1 5 5 2 6 6 2 1 5 5 1 1 6 5 5 2 2 6 1 4 1 4 3 6 2
##  [36] 2 6 4 4 4 2 4 1 6 1 4 1 6 2 3 2 6 6 2 5 2 6 6 6 1 3 3 6 4 6 3 1 4 5 1
##  [71] 1 6 4 5 5 4 6 5 4 4 1 5 5 6 1 1 3 6 2 2 3 6 2 4 3 5 2 2 1 3 3 2 2 5 2
## [106] 5 4 5 4 6 1 3 2 3 3 1 5 6 6 6 4 4 1 5 5 6 1 3 6 3 6 3 3 4 1 1 4 2 6 1
## [141] 2 3 4 1 3 5 3 4 2 1 4 1 4 2 5 2 2 2 3 1

We can turn this into a set of counts, following the multinomial distribution:

x <- as.numeric(table(y))
x

## [1] 30 29 24 25 23 29

What are the expected counts under the null-hypothesis that all of the sides are equally likely? That’s just 1/6 times the number of samples.

e <- rep(1/6, 6) * n
e

## [1] 26.66667 26.66667 26.66667 26.66667 26.66667 26.66667

And the G-Score is given by the formula that we derived on the the notes:

G <- -2 * sum( x * log(e / x) )
G

## [1] 1.708677

This should have a chi-squared distribution with 6-1 degrees of freedom (the number of sides of the die, minus one because the last probability depends on all of the others). So, what is the critical value for rejecting the null hypothesis, that’s given here for a 95% confidence level (we always use a one-sided test with the log-likelihood):

qchisq(1 - 0.05, df = 6 - 1)

## [1] 11.0705

We see that we would not reject the null hypothesis since the critical value is larger than our test statistic. We could also convert the test statistic into a p-value as follows:

1 - pchisq(G, df = 6 - 1)

## [1] 0.8878047

And again, we see that it is not significant.

2. Simulating Many Fair Die

Since it is easy in R, let’s simulate a large number of trials like the one we did manually, recording the G score for each trial.

N <- 10000
n <- 16 * 10
G <- rep(NA, N)
e <- rep(1/6, 6) * n

for (j in seq_along(G))
{
  y <- sample(seq_len(6), size = n, replace = TRUE)
  x <- as.numeric(table(y))
  G[j] <- -2 * sum( x * log(e / x) )
}

We can look at the distribution and see that it does in fact closely following a chi-squared distribution with 6-1 degrees of freedom:

hist(G, breaks = 50, probability = TRUE, col = "white")
curve(dchisq(x, df = 6 - 1),
      from = min(G), to = max(G), add = TRUE, col = "red")

How often would be (incorrectly) reject the null hypothesis?

mean((1 - pchisq(G, df = 6 - 1)) < 0.05)

## [1] 0.0541

This is almost exactly what we would want (0.05 is ideal), with the differences being due a bit to the approximation of the chi-squared and a bit due to the noise of the simulation.

3. Simulating One Unfair Die

Now, we will simulate the unfair die that we also did. Let’s start with a single die:

y <- sample(seq_len(6), size = n, replace = TRUE)
y[y >= 4] <- sample(seq_len(6), size = sum(y >= 4), replace = TRUE)
table(y)

## y
##  1  2  3  4  5  6 
## 40 37 35 19 17 12

The bias towards the small values is fairly noticeable. Let’s compute the G-score:

x <- as.numeric(table(y))
G <- -2 * sum( x * log(e / x) )
G

## [1] 28.35575

This has the same null-distribution as above, so we can see that it will be rejected. What’s the p-value?

1 - pchisq(G, df = 6 - 1)

## [1] 3.100986e-05

So, we see that the reject the null hypothesis since the p-value is much smaller than the cut-off value of 0.05 (or even 0.001).

4. Simulating Many Unfair Die

Just as above, we can simulate this for a large number of trials.

N <- 10000
n <- 16 * 10
G <- rep(NA, N)
e <- rep(1/6, 6) * n

for (j in seq_along(G))
{
  y <- sample(seq_len(6), size = n, replace = TRUE)
  y[y >= 4] <- sample(seq_len(6), size = sum(y >= 4), replace = TRUE)
  x <- as.numeric(table(y))
  G[j] <- -2 * sum( x * log(e / x) )
}

How often do we actually reject the null hypothesis here:

table(G > qchisq(1 - 0.05, df = 6 - 1))

## 
##  TRUE 
## 10000

Always! That’s why I felt confident making you do this in class that we would get a significant result manually as well. Let’s see how this compares the chi-squared:

hist(G, breaks = 50, probability = TRUE, col = "white")
curve(dchisq(x, df = 6 - 1),
      from = min(G), to = max(G), add = TRUE, col = "red")

As hoped, we see that the distribution is very different than the chi-squared, allowing the test to have sufficient power to detect deviations from the null hypothesis.

5. Goodness of Fit: One Simulation

No more die-rolling. But, let’s see in R a final extension for today of this technique. We will generate some data from a Poisson distribution with lambda equal to 2.

n <- 200
y <- rpois(n, lambda = 2)

We will ignore that we know how the data were generated. Let’s look at a table of the values:

table(y)

## y
##  0  1  2  3  4  5  6 
## 25 64 43 43 13  7  5

Let’s say we wanted to do an hypothesis test with the null hypothesis that the data in fact came from a Poisson distribution. If we also knew lambda, that would be easy with the approach above, but we don’t. So, instead, we estimate the value of lambda using the MLE:

lhat <- mean(y)
lhat

## [1] 1.98

Another difficulty is that we have a few high counts that are fairly rare. We will use a standard technique of collapsing all of the large values into a collective category (here, I chose the value of 4 and above). The expected probabilities would then be:

ptilde <- dpois(seq(0, 4), lambda = lhat)
ptilde[length(ptilde)] <- 1 - sum(ptilde[-length(ptilde)])
ptilde

## [1] 0.1380692 0.2733771 0.2706433 0.1786246 0.1392858

And the expected counts:

e <- ptilde * n
e

## [1] 27.61385 54.67542 54.12866 35.72492 27.85715

The observed counts are as follows, collapsing the largest values in a catch-all finaly category:

x <- as.numeric(table(y))
x[5] <- sum(x[seq(5, length(x))])
x <- x[seq(1, 5)]
x

## [1] 25 64 43 43 25

Now, we just compute G as before:

G <- -2 * sum( x * log(e / x) )
G

## [1] 5.919403

How many degrees of freedom are there? Well, we have 5 categories, so our parameter space (big Theta) has 5-1 degrees of freedom. This time, however, the null hypothesis has 1 degree of freedom (corresponding to estimating the value of lambda). So, the degrees of freedom of the chi-squared should be (5 - 1) - 1, or 3. Here is the correct computation of the p-value:

1 - pchisq(G, df = length(x) - 1 - 1)

## [1] 0.115598

So, as desired since we did generate this from a Poisson distribution, we would fail to reject the null hypothesis.

6. Goodness of Fit: Many Simulations

Let’s do what we did above again: simulate this many times.

N <- 10000
n <- 200
G <- rep(NA, N)

temp <- rep(NA, N)
for (j in seq_along(G))
{
  y <- rpois(n, lambda = 2)
  lhat <- mean(y)
  ptilde <- dpois(seq(0, 4), lambda = lhat)
  ptilde[length(ptilde)] <- 1 - sum(ptilde[-length(ptilde)])
  e <- ptilde * n
  x <- as.numeric(table(y))
  x[5] <- sum(x[seq(5, length(x))])
  x <- x[seq(1, 5)]
  G[j] <- -2 * sum( x * log(e / x) )
}

The histogram does generally follow the chi-squared distribution with 6 degrees of freedom:

hist(G, breaks = 50, probability = TRUE, col = "white")
curve(dchisq(x, df = 5 - 1 - 1),
      from = min(G), to = max(G), add = TRUE, col = "red")

And the probability of falsely rejecting the null hypothesis, again, is as expected.

mean((1 - pchisq(G, df = 3)) < 0.05)

## [1] 0.0545

7. Goodness of Fit: Zero-Inflated Poisson

Let’s do one last simulation, this time jumping right to the larger version. Here, we will generate data from the zero-inflated Poisson distribution (or ZIP): it’s like a Poisson, but some percentage of the time (we will use 10%) we artificially convert the value to zero. Let’s see if we can detect this in the simulation:

N <- 10000
n <- 200
G <- rep(NA, N)

temp <- rep(NA, N)
for (j in seq_along(G))
{
  y <- rpois(n, lambda = 2)
  y[runif(length(y)) < 0.1] <- 0
  lhat <- mean(y)
  ptilde <- dpois(seq(0, 4), lambda = lhat)
  ptilde[length(ptilde)] <- 1 - sum(ptilde[-length(ptilde)])
  e <- ptilde * n
  x <- as.numeric(table(y))
  x[5] <- sum(x[seq(5, length(x))])
  x <- x[seq(1, 5)]
  G[j] <- -2 * sum( x * log(e / x) )
}

How often do we (correctly, in this case) reject the null hypothesis that this is a proper Poisson distribution:

mean((1 - pchisq(G, df = 3)) < 0.05)

## [1] 0.5645

A little more than half of the time. Not nearly are powerful as in the die-roll case, but the ZIP with 10% converted to zero is not nearly as different as the Poisson in this case. If we increased the percentage or increased lambda, the difference would be easier to detect.