Tutorial 28: Deep learning for images

The real benefit of neural networks are when they are applied to high-dimensional tasks such as image classification. As I mentioned last time, deep neural networks are able to perform automatic dimensionality reduction in the first sets of layers and then prediction tasks in the upper layers, all at the same time.

For today, let's load in all of the libraries we will need for the tutorial.

In [1]:
import wiki
import iplot
import wikitext

import numpy as np

import numpy as np
import matplotlib.pyplot as plt
import sklearn

from matplotlib import pyplot as plt
Loading BokehJS ...
In [2]:
assert wiki.__version__ >= 6
assert wikitext.__version__ >= 2
assert iplot.__version__ >= 3
In [3]:
from keras.applications.vgg19 import VGG19
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input, decode_predictions
from keras.models import Model
from keras.preprocessing import image
Using TensorFlow backend.

VGG-19

Rather than building a model from scratch, which is hard and time consuming, let's just grab an image processing model directly. The model here is called VGG19; it was designed for a well-known image processing challenged known as ILSVRC. (Note: The first time you run this code may take a few minutes because keras does not pre-install the model itself).

In [4]:
vgg19_full = VGG19(weights='imagenet')
vgg19_full.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
_________________________________________________________________

This is an impressively large neural network. It has over 143 million parameters!

What kind of input and output is expected in the model?

Loading an image file

I've written a small function to load an image file from a URL. We need a bit of boilerplate code because the load_image function will not load an image over the internet directly (only from a local file).

In [5]:
def load_image(link, target_size=None):
    import requests
    import shutil
    import os
    
    _, ext = os.path.splitext(link)
    
    r = requests.get(link, stream=True)
    with open('temp.' + ext, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)
        
    img = image.load_img('temp.' + ext, target_size=target_size)
    return image.img_to_array(img)

Let's test out the function by grabbing and image of a dog.

In [6]:
img_path = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/Golden_retriever_eating_pigs_foot.jpg/170px-Golden_retriever_eating_pigs_foot.jpg"
img = load_image(img_path, target_size=(224, 224))
img.shape
Out[6]:
(224, 224, 3)

Notice that this is a three-dimensional array containing the red, green, and blue pixel intensities. As requested, the image has been reformated to be 224-by-224 pixels in size.

We can see the image in Python by calling the imshow function, but need to divide all of the pixel intensities by 256.

In [7]:
plt.imshow(img / 256)
Out[7]:
<matplotlib.image.AxesImage at 0xd3fa898d0>

We need to do some pre-processing to the image before sending it to keras:

In [8]:
x = np.expand_dims(img, axis=0)
x = preprocess_input(x)
x.shape
Out[8]:
(1, 224, 224, 3)

Finally, let's run this image through the VGG19 model

In [9]:
probs = vgg19_full.predict(x)
probs[0].shape
Out[9]:
(1000,)

To assist in using the VGG19 function, keras provides a helper function decode_predictions that converts the probabilities into the most likely categories. How well does the function do here?

In [10]:
decode_predictions(probs, top=15)
Out[10]:
[[('n02099601', 'golden_retriever', 0.6441851),
  ('n02102318', 'cocker_spaniel', 0.16954248),
  ('n02099712', 'Labrador_retriever', 0.06527001),
  ('n02100735', 'English_setter', 0.047707632),
  ('n02101556', 'clumber', 0.026033377),
  ('n02104029', 'kuvasz', 0.010793093),
  ('n02111500', 'Great_Pyrenees', 0.008516561),
  ('n04409515', 'tennis_ball', 0.002479528),
  ('n02091635', 'otterhound', 0.0019789534),
  ('n02102480', 'Sussex_spaniel', 0.0014972917),
  ('n02101006', 'Gordon_setter', 0.0012067808),
  ('n02097474', 'Tibetan_terrier', 0.0009656719),
  ('n04254680', 'soccer_ball', 0.0008964036),
  ('n02088238', 'basset', 0.00088096445),
  ('n02097298', 'Scotch_terrier', 0.0007966581)]]

It correctly predicts that this is a golden retriever.

Please do not take for granted how amazing this result is... Something like this would have been impossible to create less than a decade ago.

Experimenting with VGG19 — Part I

For the rest of class, you are going to do some experimentation with the VGG19 model. To start, take a look at some of the categories available for prediction in the challenge:

You might also take a look at a summary paper about the collection:

Your task here is to select 12 different image categories, trying to pick a range of object types, and seeing how well the model performs. Try to use the paper (see figure 15) to pick at least one "difficult" category.

(Hint: I've copied the self-contained code for the Golden retriever code below; copy this 12 times and just modify the url)

In [11]:
img_path = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/Golden_retriever_eating_pigs_foot.jpg/170px-Golden_retriever_eating_pigs_foot.jpg"
img = load_image(img_path, target_size=(224, 224))
plt.imshow(img / 256)
x = np.expand_dims(img, axis=0)
x = preprocess_input(x)
decode_predictions(vgg19_full.predict(x), top=15)
Out[11]:
[[('n02099601', 'golden_retriever', 0.6441851),
  ('n02102318', 'cocker_spaniel', 0.16954248),
  ('n02099712', 'Labrador_retriever', 0.06527001),
  ('n02100735', 'English_setter', 0.047707632),
  ('n02101556', 'clumber', 0.026033377),
  ('n02104029', 'kuvasz', 0.010793093),
  ('n02111500', 'Great_Pyrenees', 0.008516561),
  ('n04409515', 'tennis_ball', 0.002479528),
  ('n02091635', 'otterhound', 0.0019789534),
  ('n02102480', 'Sussex_spaniel', 0.0014972917),
  ('n02101006', 'Gordon_setter', 0.0012067808),
  ('n02097474', 'Tibetan_terrier', 0.0009656719),
  ('n04254680', 'soccer_ball', 0.0008964036),
  ('n02088238', 'basset', 0.00088096445),
  ('n02097298', 'Scotch_terrier', 0.0007966581)]]

Experimenting with VGG19 — Part II

Now, the second experiment asks you to dust off your HTML parsing. Write code that starts with a Wikipedia page and prints out the predicted categories for each image (you can filter the images if you would like) on the page.

Test it on something interesting, like the page about dogs. You can, but do not need to, wrap it up as a function.