# Class 06: Bikes (and Matrices!)

### 2017-09-14

## Linear Models with Matrices in R

Today, we will look at a dataset of bike sharing usage in Washington, DC. Our task is to predict how many mikes are rented each day:

We are going to fit linear models to the dataset, but I want
to modify how we build the models. Specifically, I want to
manually construct the **data matrix** rather than having it
made behind the scenes by R.

## Matrices

### Linear Maps

Consider all functions from an n-dimensional space into an m-dimensional space,

That preserves addition,

And multiplication by a fixed scalar value,

Such functions are known as **linear maps** (the definition can be abstracted
to infinite dimensional spaces and over other fields, but we will only need
the finite real-valued case in this course).

There is an important representation theorem stating that every such mapping
can be described by an n-by-m array of real numbers. This array is commonly
called a **matrix**. How exactly do we calculate the function f given its
matrix representation A? It is easier to explain with fixed values for n
and m. Let’s consider the following matrix:

So, in this case we have n = 3 and m = 2. Therefore this is a mapping that takes a triple of numbers and returns a pair of numbers. Let’s define the input as a vector u:

And the output as a vector v:

The linear map is then defined as:

So, each component of v is a linear combination of all components of u. We can write this compactly using summation notation:

Conveniently, this last equation holds for any arbitrary choice of n and m. Finally, we represent this symbolically by

### Function composition

Consider two linear maps:

If we apply f to an vector in n-dimensional space we get an m-dimensional vector. We could then take the output of this map and apply g to it in order to get a p-dimensional vector. This is known a function composition. We can represent the action of first applying f and then applying g as a new function h:

It is a fairly intuitive and easy to prove result that if f and g are linear maps, so is h. Let f be represented by the matrix A, g by the matrix B, and h by the matrix C. A natural question is what relationship exists between A, B, and C?

It turns out that the result is just another sum:

This is known as a **matrix product** and is written as:

If the concept of matrix multiplication is new to you, an animated visualization of multiplying two matricies can be useful. Of course, for this semester understanding the abstract meaning behind matrix multiplication (function composition) is much more important than grasping the mechanics of computing the new matrix:

Notice that if we represent vectors as one-column matricies, the definition of matrix multiplication is equivalent to our defintion of applying a matrix to a vector:

## Matrices in R

### Basics

There is extensive functionality for working with matricies in R. Let’s make a random 5-by-5 matrix and a 5-by-1 matrix, the latter we can think of as a vector.

Element-wise arithmetic is assumed by default:

To calculate a matrix produce, R requires us to use the
symbol `%*%`

. If a matrix is square, there is nothing
stopping us from composing a matrix with itself:

Similarly, we can multiply the matrix by the column vector to compute the action of the matrix A as a linear map:

The transpose of a matrix, denoted by using t as a superscript, can be computed with the t function:

This is often useful in matrix computations. Similarly, we
can computing the inverse of a matrix with the function
`solve`

:

A matrix inverse, by definition, describes the inverse of the underlying linear map (its also relatively simple to show that linear maps have linear inverses, when they exist). Note that matrix inversion is a computationally unstable procedure and should be avoided when possible.

### Subsetting matricies

It will often be useful to take a subst of the rows and columns in a matrix. Here we take rows 2 and 3 and columns 1 and 2:

Here we take columns 1 and 2; by leaving the rows component empty every row is returned:

There is a strange convention in R that, by default, if we select a sub-matrix with only one row or one column, the result will be converted from a rectangular matrix to a non-dimensional vector. Notice that the output below is not given as a matrix with one column:

Usually this is not a problem, but if we want to be safe
we can add the option `drop = FALSE`

to the subset command.
Notice the difference in the output here compared to the
output above:

Finally, we can also subset by giving a logical statement in either the rows or columns spot. Here we select only those rows where the numbers 1 through 5 are greater than 3:

As expected, the output is a matrix with just two rows.

## Multivariate Linear Models

### Matrix Formulation

We have been working with multivariate linear models over the past few classes, though I have only ever written the formal equation in the case where there are two explanatory variables. In general, multivariate regression represents the following model:

For simplicity, we won’t include an explicit intercept term in the model. If we want one, we will just make the first variable $x_{1,i}$ equal to one for every value of i.

The statistical estimation problem is to estimate the p components of the multivariate vector beta.

Using our matrix notation, we can write the linear model simultaneously for all observations:

Which can be compactly written as:

The matrix X is known as the **design matrix** or **model matrix**.
For reference, note the following equation yields these dimensions:

## Fitting Linear Models with Matricies

Note that the bikes data, like every other dataset we have used,
is not a matrix. It is something that R calls a **data frame**:

While both matrices and data frames have data organized in rows and columns, matrices force all of the data to be numeric (its actually more complicated than this in R, but just go along with it for now). A data frame on the other hand can have different variable types in each column.

An easy way to construct a matrix in R, is to use the `select`

function
to grab only numeric columns and the function `as.matrix`

to convert
the output into a matrix object. We will throughout the course use the
notation that the variable y is a vector holding the response of
interest and X is a matrix containing the variables we want to use in
a model.

We can then create specific training and validation sets using the logical subset method from above:

So this generally looks good, but we left out the intercept. Recall
that our matrix formulation required us to add an explicit column of
1’s if we wanted an intercept. Let’s do this directly in the matrix
X using the `cbind`

function, and repeat:

## Using lm.fit

We have seen how to use the `lm`

function to quickly fit
linear regression directly from data frames. Now, how do
we actually fit a linear model once we have these matricies?
Next class we will see how to do this directly with matrix
operations. There is an intermediate function that solves the
linear regression problem directly from our matricies called
`lm.fit`

. In fact, the `lm`

function internally calls this function.
As inputs, it takes just the X matrix and response y. There
is a lot of diagnostic output, but we will take just the
`coef`

component, corresponding to the coefficents matrix:

We can create predicted values for the whole dataset by matrix
multiplying `X`

with `beta`

:

Let’s verify that this gives the same output as the `lm`

function:

And, as hoped, it does!

## Another (The Best!) Way to Make Model Matricies

When we simply want to set the model matrix X to a subset
of the numeric columns in our data frame, the function
`as.matrix`

is usually sufficient. The formula interface
to `lm`

is incredibly useful however when using categorical
variables or when processing numeric variables by special
functions such as `poly`

.

The function `model.matrix`

allows us to compute the model
matrix from the formula interface. In fact, the `lm`

function
calls this to convert our inputs into a model matrix. The
output of this is then passed to `lm.fit`

. It will also, by
default, include an intercept term for us. Here we use it
to build the same model matrix as before:

Notice that the intercept has been added for us. What is nice
about this formulation is that we can use commands like `poly`

and
factor (or just raw character vectors) and have R take care of
all the hard work for us:

We will begin next class by showing some best practices for how to build model matrices. At that point we will be able to apply more models that require us to put data into a matrix format.

### Indicator variables

Notice that each of the season variables is an indicator for which season of the year each day is. Let’s look at the first 6 carriers to verify this:

If we look at a table of all the carriers, we see that one of the carriers is missing:

The missing value “Fall” is known as the **baseline** in this model.
The intercept indicates the expected rentals for a day in the Fall.
All of the other terms give how much more or less each season has
more or less expected rentals relative to the baseline “Fall”.

Changing the baseline will change all of the beta coefficients. However, the predicted values will remain the same. In 209 and 289, I spend a lot of time talking about changing the baseline and understanding the model from different perspectives. As we are focused on prediction, which is unchanged, this will be much less of a concern for us. Just note that by default variables are sorted in alphabetical order, which is why “Fall” is the baseline here.