Correlation is a way of measuring the linear relationship between two random samples. Specifically, a correlation of 0 indicates no linear relationship, +1 a perfect positive relationship, and -1 a perfect negative relationship. We can define it as the signed square root of the R-square measurement from a simple linear regression. Recall that this is the variance of the residuals divided by the variance of the Ys.

In the box below, I randomly pick a correlation and generate some data with that structure. Then, draw a plot. Can you guess

rho <- round(runif(1, -1, 1), 3)
X <- scale(matrix(rnorm(50 * 2), ncol = 2))
X <-  X %*% solve(chol(var(X))) 
C <- matrix(c(1, rho, rho, 1), 2, 2)
X <- X %*% chol(C)
colnames(X) <- c("x", "y")
df <- as_tibble(X)
ggplot(df, aes(x, y)) +
  geom_point()

Let’s see how well we can guess the answer!