Class 17: Photos!
Today we begin our study of image data. We will spend about two weeks working with several image datasets, building up to state-of-art methods such as convolutional neural networks.
Image data is similar in many ways to text data. Both are relatively unstructured and we generally need to perform some sort of feature extraction in order to do predictive modelling. Image data is stored (or at least can be represented) in a numeric format giving the pixel intensities of the file. Because of this is may seem easier than text data where we do not even start with numbers. By most measures this is not the case as images have many more difficulties that outweight this one benefit.
One difficulty of image data is that the files become very large. You will see that the images we work with are fairly low resolution and yet they still take many gigabyte for just a few thousand images.
My original draft of notes dove head-first into some high level image analysis. I took a step back and decided it would be beneficial to everyone to start with a gentle introduction that shows how to work with image data. As such, note that like the last class much of today’s notes are for your general knowledge and not strictly required for completing the labs.
Images to numbers and back again
You all were nice enough to run around outside last week to
take some photos for class. Let’s not let those go to waste!
The first step in working with the images is to read each
image into R. There are few different functions for doing
this. To start, I will use the
readJPEG function from the
jpeg library. We will start with a single example:
The data is stored as an array. This is a generalization of a matrix beyond two dimensions. We can think of the array as storing three matricies of the same size, with one column per column of pixels, one row per row of pixels, with the red, green, and blue intensities storied as their own tables.
The numbers are stored as a value between 0 and 1. Note that if all the color channels are 1 the color is “white” and if all pixels are 0 the color is “black”. We can do some simple math here to see which of the colors is the most dominant in this photograph
Red is the most active color, followed closely by green and then finally by blue. What we actually want to do though is see the image itself. This block of code plots it in R while preserving the correct aspect ratio.
I choose this in part because I like that is a photo of someone taking a photo of someone else. We can see that the red comes from the brick in the photo and the green from the trees. The blue, which is not as prominent, mostly comes from the sky.
This all seems surprisingly easy. Let’s try another example.
Oh no! The image is flipped on its side. This is because many cell phones do not actually flip the digital image directly. Instead, they save the original image and attatch some metadata that indicates the orientation of the image.
We now have two additional steps: we need to read this metadata and then rotate the image. We will do this by way of two different packages. The exif package returns
There is a lot of information here about the phone and the photograph. We even see information such as whether the flash was used the digital “speed” of the sensor. Importantly, there is also a number called orientation. It turns out that “6” means we need to rotate the image 90 degrees (“3” means 180 and “8” is 270). Other numbers indicate forms of mirror images that did not appear in this corpus of images.
In order to easily rotate the image, we will use the EBImage
package to load the image. We can then use the
that operates on an EBImage object to rotate the image.
Now, the image looks great but the file is very large. Trying to load a few thousand of these will not be feasible if we want to store everything in memory. Instead, we will downsample to an image that is only 224-by-244 pixels. This also stretches the image, but for our purpose here that should not be a problem.
While grainy, we can still make out the basic subjects of the image. Most importantly, it is still obvious that the image has two people in it and it was taken outdoors.
It will also be useful to have an even smaller version of the image. This is just a thumbnail, but still contains a lot of information that we can use for detecting whether an image was taken inside or outside.
Basically, this is all I did to the entire dataset, while also taking care to construct a response variable, training flags, and the observation ids.
Fun with exif
There are several interesting things that we can do with the exif metadata attached to your images. You may even be surprised (and a little worried) about what I can do, actually. Let’s grab the exif data for all of the images that you uploaded. A few of the images have invalid exif data, so the following block of code removes these.
Notice that two of the columns in this dataset are longitude and latitude. What if we plot these over a map (perhaps trimming a few outliers)? We can even use the time variable to show when these photos were taken.
Not all of you have location data turned on, so this is only those photos with non-missing location data. But still, it is is interesting how accurate these values seem to be. Don’t worry, I won’t be sharing any geolocation data publically (notice the chunks above call local files on my computer), I just wanted to illustrate how pervasive this kind of information is.
Classifying Class Data
Now that we know roughly how to turn image data into arrays, let’s load the entire corpus from class. The data is split into two parts, much like the annotated data you made for today’s class. The first is a small data frame that gives just the metadata:
The arrays containing the pixels can be read with the following code:
The dimension of this array gives the number of rows, the width of the image, the height of the image, and the number of channels in the image. The channels here refer to the red, green, and blue values.
Today, we’ll start by using the
apply function to collapse this into
a two-dimensional array. This has one row for each observation and
one column for each pixel value.
Now that we have data in a standard matrix, we can throw this into an elastic net model.
Do you see any patterns in these numbers?
Every value in the first 1000 is negative. There are only two values from 1000-2000, and all the values in the 3000s are postive. It turns out that the first 1024 (32 * 32) columns are the red channels, the next 1024 are the green channels, and the final 1024 are the blues. So there are negative weights on the red channels, very small weights on green, and positive weights on blue. Does that make sense?
We see that this model is fairly predictive:
It may help in this case to set the alpha parameter close to zero so the weights get spread out over more of the photograph. Here we see how this offers a slight improvement in the validation set.
Finding negative examples, validation points that we did not classify correctly, is an important task in image classification much as it is with text classification. To look at the negative examples, I will read in the larger 224x224 images:
First, let’s plot the examples taken inside but that our algorithm thinks were taken outside:
Do you notice any interesting patterns here?
Now, let’s see outdoor images that were classified as being inside:
What do many of these images have in them (hint: we have a lot of this at UR).
Notice that we can do a PCA-based visualization of this data much as we did with the text data in the last class.
The plot shows a pattern differentiating the indoor and outdoor photographs, but not a particularly clear one. We will look at more PCA plots next week.
Finally, we can construct new meta features by using our knowledge of how the columns of X are organized. For example, here I will take the average of the red, green, and blue channels and fit a new model on just these.
Notice that the final model is almost as good as the one fit on the entire dataset. As a benefit, it does not over-fit the training data.
Your assignment for the next lab is to work with this data and to try to find a creative way to do prediction. At a minimum, at least show that you can apply and understand what I did in class today.