MLE

Let’s simulate some data from an exponential distribution with lambda equal to 4. Here’s a single sample of 50 observations:

x <- rexp(50, rate = 1/4)
x
##  [1]  3.0207273  4.7265711  0.5828269  0.5591810  1.7442745 11.5798741
##  [7]  4.9182482  2.1587314  3.8262700  0.5881840  5.5629405  3.0481194
## [13]  4.9504142 17.6957369  4.2181727  4.1409758  7.5041407  2.6189865
## [19]  1.3477339  2.3539189  9.4580610  2.5675704  1.1764816  2.2634621
## [25]  0.4242905  0.2377566  2.3148499 15.8357314  4.6932484  3.9872518
## [31]  5.7411414  0.1490741  1.2960406  5.2818717  0.8140414  4.0909035
## [37]  1.2069637  2.9008572  3.0061708  0.9401098  4.3195245  4.1129876
## [43]  5.1690466  5.0124214  2.2185656  1.2051320  5.1724986  3.9782232
## [49]  2.0566972  8.0313296

Estimating the mean is not particularly interesting, as it is the same whether we use the sample mean of the MLE. The variance is a different story. We have two options: the MLE (squared mean) and the sample variance (Sx^2). Note that the theoretical variance should be 16.

sprintf("MLE Variance: %02.02f, Sample Variance: %02.02f", mean(x)^2, var(x))
## [1] "MLE Variance: 15.49, Sample Variance: 12.72"

The specific values will change each time we run the code, but you should see that there is (usually) a noticeable difference between the two estimators. Let’s run a simulation to see what happens in the long-run:

N <- 1e5
mle <- rep(NA, N)
svar <- rep(NA, N)
for (j in seq_len(N))
{
  x <- rexp(50, rate = 1/4)
  mle[j] <- mean(x)^2
  svar[j] <- var(x) 
}

Here is the empirical bias of the sample variance (recall that it has theoretical value of zero):

mean(svar - 16)
## [1] 0.0161362

Not so bad. What about the empirical bias of the MLE? Recall that this might not be theoretically equal to zero for non infinite sample size:

mean(mle - 16)
## [1] 0.3335618

Okay, so the bias is not huge but it is many times larger than the sample variance. Is the MLE a terrible choice here? Well, not really. Let’s look at the variance of the two estimators (note that these are the variances of the estimators of the variance):

sprintf("Var of MLE: %02.02f, Var of S_X^2: %02.02f", var(mle), var(svar))
## [1] "Var of MLE: 21.52, Var of S_X^2: 41.04"

So, the sample mean has a much larger variance. It is on average unbiased, but wobbles a lot more around the correct value. How can we balance these? One way is to look at the average distance to the variance (this is called the risk of the estimator). We can do this using any distance metric, such as the absolute value:

sprintf(
  "Avg. Dist MLE: %02.02f, Avg. Dist S_X^2: %02.02f",
  mean(abs(mle - 16)), mean(abs(svar - 16))
)
## [1] "Avg. Dist MLE: 3.63, Avg. Dist S_X^2: 4.85"

A related metric is called the root mean squared error, or RMSE, which we can describe as follows:

sprintf(
  "RMSE MLE: %02.02f, RMSE S_X^2: %02.02f",
  sqrt(mean((mle - 16)**2)), sqrt(mean(abs(svar - 16)**2))
)
## [1] "RMSE MLE: 4.65, RMSE S_X^2: 6.41"

So, we see that using these metrics the MLE has a better risk at the expense of being unbiased.