In this lab we will experiment with convolutional neural networks to process images. We will use Keras with Tensorflow. Here are installation instructions for Keras: https://keras.io/#installation, and here are installation instructions for Tensorflow: https://github.com/tensorflow/tensorflow#download-and-setup. Keras is an easy front-end for Tensorflow that allows you to use high-level layers on top of primitive operations implemented in Tensorflow.
We will take a set of training images and sentences from the MS-COCO dataset (400k sentences) and train our network to detect images that contain women vs. men, based on information from the captions.
First, let's import libraries and make sure you have everything properly installed.
import tensorflow as tf
import numpy as np
import random, json, string, pickle
import keras
import keras.layers
import keras.models
import keras.optimizers
import keras.callbacks
from keras.preprocessing import image
import keras.applications.vgg16 as vgg16
import keras.applications.resnet50 as resnet50
import matplotlib.pyplot as plt
from nltk import word_tokenize
%matplotlib inline
We will load here a VGG network proposed by Simonyan & Zisserman "Very Deep Convolutional Networks for Large-Scale Visual Recognition"
model = vgg16.VGG16(weights='imagenet')
model.summary()
The last dense layer "predictions" uses a Softmax activation, so the outputs correspond to probabilities for the 1000 classes in the Imagenet ILSVRC task. We will load an image an make some predictions with this network. I took a picture of my toaster to see what the network outputs. It did quite well.
img_path = 'test_image.jpg' # This is an image I took in my kitchen.
img = image.load_img(img_path, target_size=(224, 224))
img_arr = image.img_to_array(img)
x = np.expand_dims(img_arr, axis=0) # The model only accepts batches so we add a dummy dimension.
x = vgg16.preprocess_input(x) # The preprocessing should be the same that was used during training.
predictions = model.predict(x)
label_predictions = vgg16.decode_predictions(predictions, top = 10)
print('Input image size:', x.shape)
print('Prediction scores: ', predictions.shape)
print('\nPredictions:')
for (i, (category_id, name, probability)) in enumerate(label_predictions[0]):
print('%d. %s(%.3f)' % (i, name, probability))
plt.imshow(np.asarray(img));
We will use the CNN outputs at the "fc2" layer as features in our task of detecting 80 object categories from the MS-COCO dataset. Apart from image descriptions, the MS-COCO dataset contains annotations for 80 object categories with bounding boxes. In this lab we will not be using the bounding boxes so we will only use as a label for each image a vector with 80 dimensions where each entry is a binary value indicating whether an instance of a corresponding object is present.
mscoco_objs = json.load(open('annotations/instances_train2014.json'))
print(mscoco_objs.keys())
print(mscoco_objs['categories'][0])
print(mscoco_objs['annotations'][0])
imageIds = list(set([entry['id'] for entry in mscoco_objs['images']]))[:50000]
imageId2Name = {entry['id']: entry['file_name'] for entry in mscoco_objs['images']}
imageId2index = {image_id: idx for (idx, image_id) in enumerate(imageIds)}
categoryId2index = {entry['id']: idx for (idx, entry) in enumerate(mscoco_objs['categories'])}
labels = np.zeros((len(imageIds), 80))
print('Computing labels'),
simageids = set(imageIds)
for (i, entry) in enumerate(mscoco_objs['annotations']):
if entry['image_id'] not in simageids: continue
if i % 10000 == 0: print('.'),
image_id = entry['image_id']
category_id = entry['category_id']
labels[imageId2index[image_id], categoryId2index[category_id]] = 1
print('\n')
print('Labels ', labels.shape)
We will use the CNN outputs at the "fc2" layer as features in our task of detecting 80 object categories. So we need to remove the "predictions" linear layer, let's remove it and verify that the layer is removed by printing the model summary.
# Remove last Linear/Dense layer.
model = vgg16.VGG16(weights='imagenet')
model.layers.pop()
model.outputs = [model.layers[-1].output]
model.layers[-1].outbound_nodes = []
model.summary()
Let's compute features for all images in the training dataset. You will need to download the training images from the MS-COCO webpage. I'm including the pre-computed vgg16_features.p file so you don't really need to run this code but I encourage you to do it so you have a better idea of computation times for your project.
features = np.zeros((len(imageIds), 4096), dtype=np.float32)
batch_size = 500
n_batches = len(imageIds) / batch_size
index = 0
for b in range(0, n_batches):
batch = np.zeros((batch_size, 224, 224, 3))
print(('Computing features for batch %d of %d') % (b + 1, n_batches))
for i in range(0, batch_size):
img_path = '/data/data/coco/train2014/' + imageId2Name[imageIds[index]]
img = image.load_img(img_path, target_size=(224, 224))
img = image.img_to_array(img)
batch[i, :, :, :] = img
index = index + 1
print(('Batch loaded for batch %d of %d') % (b + 1, n_batches))
batch = vgg16.preprocess_input(batch)
features[b * batch_size : (b + 1) * batch_size, :] = model.predict(batch)
pickle.dump({'features': features, 'imageIds': imageIds, 'labels': labels,
'imageId2Name': imageId2Name, 'imageId2index': imageId2index, 'categoryId2index': categoryId2index},
open('vgg16_features.p', 'w'))
Now that we have computed features for each image in our training data. We will try to train the model to predict the labels using a binary cross entropy loss. Remember that this was one of the loss functions you implemented in the Deep Learning Lab. We define here a simple 2-layer neural network that takes input feature vectors and produces 80 probabilities for each object using the output of a sigmoid layer.
# Let's define the model.
# 1. The inputs are feature vectors of size 4096 corresponding to each image.
x = keras.layers.Input(batch_shape = (None, 4096))
# 2. Then we add a linear layer with a ReLU activation and Batch-normalization afterwards.
hidden = keras.layers.Dense(512, activation = 'relu')(x)
hidden = keras.layers.BatchNormalization()(hidden)
hidden = keras.layers.Dropout(0.5)(hidden)
# 3. Add another dense layer with and pass a sigmoid function afterwards.
predictions = keras.layers.Dense(80, activation = 'sigmoid')(hidden)
# 4. Define the inputs and outpus of the model and print a summary.
# Remember that one could have multiple inputs or outputs.
mlp_model = keras.models.Model(input = [x], output = [predictions])
mlp_model.summary()
# Define the optimization method, and its parameters.
#sgd = keras.optimizers.SGD(lr = 0.01, decay = 1e-2, momentum = 0.9, nesterov = True)
sgd = keras.optimizers.Adam(lr = 0.01, decay = 1e-2) # Maybe this works better?
# Define the loss function that will be used to compute the error between predictions and labels.
mlp_model.compile(loss='binary_crossentropy', optimizer = sgd, metrics=['accuracy'])
Let's load the precomputed features.
# First load features from file if needed.
precomputed = pickle.load(open('vgg16_features.p'))
features = precomputed['features']
imageIds = precomputed['imageIds']
imageId2Name = precomputed['imageId2Name']
imageId2index = precomputed['imageId2index']
categoryId2index = precomputed['categoryId2index']
labels = precomputed['labels']
Now let's train the model. (Note: again I'm including the mlp_model_weights.hdf5 file in the lab).
train_features = features[:40000, :]
train_labels = labels[:40000, :]
train_imageIds = imageIds[:40000]
val_features = features[40000:, :]
val_labels = labels[40000:, :]
val_imageIds = imageIds[40000:]
mlp_model.fit(train_features, train_labels,
validation_data = (val_features, val_labels),
nb_epoch = 20, batch_size = 128, shuffle = True);
mlp_model.save_weights('mlp_model_weights.hdf5')
Now that we have trained the model we can try running it on some data that we didn't use for training.
predictions = mlp_model.predict(val_features)
for ind in [2, 3, 4, 39, 14]:
img_path = '/data/data/coco/train2014/' + imageId2Name[val_imageIds[ind]]
img = image.load_img(img_path, target_size=(224, 224))
img_arr = np.expand_dims(image.img_to_array(img), axis = 0)
plt.figure()
plt.imshow(np.asarray(img))
vlabels = [mscoco_objs['categories'][idx]['name'] for idx in np.nonzero(val_labels[ind, :])[0]]
vpreds = [(mscoco_objs['categories'][idx]['name'], predictions[ind, idx])
for idx in np.nonzero(predictions[ind, :] > 0.75)[0]]
plt.title('labels:' + str(vlabels) + '\npredictions:' + str(vpreds))
In our early model, we extracted features and then trained a model on top of the extracted features. Here we will define a model that integrates the feature extractor (VGG-16), and the model that predicts 80 categories that we defined before.
# Let's define the model.
# 1. The inputs are now RGB images in a batch of size 224x224.
x = keras.layers.Input(batch_shape = (None, 224, 224, 3))
# Load the VGG16 model and remove the last softmax and linear layers.
vgg = vgg16.VGG16(weights='imagenet', include_top = False)
for layer in vgg.layers: # Maybe let's keep the convolutional layers frozen for faster processing.
layer.trainable = False
feats = vgg(x) # "model" is the VGG-16 network without fully connected layers.
feats = keras.layers.Flatten()(feats) # Make the output volume of the convolutional output flat.
# 2. Then we add a linear layer with a ReLU activation and Batch-normalization afterwards.
hidden = keras.layers.Dense(512, activation = 'relu')(feats)
hidden = keras.layers.BatchNormalization()(hidden)
hidden = keras.layers.Dropout(0.5)(hidden)
# 3. Add another dense layer with and pass a sigmoid function afterwards.
predictions = keras.layers.Dense(80, activation = 'sigmoid')(hidden)
# 4. Define the inputs and outpus of the model and print a summary.
# Remember that one could have multiple inputs or outputs.
full_model = keras.models.Model(input = [x], output = [predictions])
full_model.summary()
# Define the optimization method, and its parameters.
#sgd = keras.optimizers.SGD(lr = 0.01, decay = 1e-2, momentum = 0.9, nesterov = True)
sgd = keras.optimizers.Adam(lr = 0.001, decay = 1e-6) # Maybe this works better?
# Define the loss function that will be used to compute the error between predictions and labels.
full_model.compile(loss='binary_crossentropy', optimizer = sgd, metrics=['accuracy'])
Now code for training the above model.
import random
# We need to rely on this because we can not load 50k images on memory at the same time.
def DataGenerator(imageIds, imageLabels, batch_size):
batch = np.zeros((batch_size, 224, 224, 3))
labels = np.zeros((batch_size, 80))
while True:
for i in range(0, batch_size):
index = random.randint(0, len(imageIds) - 1)
img_path = '/data/data/coco/train2014/' + imageId2Name[imageIds[index]]
img = image.load_img(img_path, target_size=(224, 224))
img = image.img_to_array(img)
batch[i, :, :, :] = img
labels[i, :] = imageLabels[index, :]
batch = vgg16.preprocess_input(batch)
yield batch, labels
# Why use class weights?
class_weight = (1 - train_labels).sum(axis = 0) / train_labels.sum(axis = 0)
# The method also does multi-threaded loading for you so you can load batches while the GPU is busy.
samples_per_epoch = len(train_imageIds) / 10
full_model.fit_generator(DataGenerator(train_imageIds, train_labels, 50), samples_per_epoch, nb_epoch = 10,
validation_data = DataGenerator(val_imageIds, val_labels, 50),
nb_val_samples = len(val_imageIds) / 20,
nb_worker = 3, max_q_size = 4, pickle_safe = True,
class_weight = class_weight)
full_model.save_weights('full_model_weights.hdf5')
for ind in [2, 3, 4, 39, 14]:
img_path = '/data/data/coco/train2014/' + imageId2Name[val_imageIds[ind]]
img = image.load_img(img_path, target_size=(224, 224))
img_arr = np.expand_dims(image.img_to_array(img), axis = 0)
plt.figure()
plt.imshow(np.asarray(img))
img_arr = vgg16.preprocess_input(img_arr)
predictions = full_model.predict(img_arr)
#print(predictions)
vlabels = [mscoco_objs['categories'][idx]['name'] for idx in np.nonzero(val_labels[ind, :])[0]]
vpreds = [(mscoco_objs['categories'][idx]['name'], predictions[0, idx])
for idx in np.nonzero(predictions[0, :] > 0.75)[0]]
plt.title('labels:' + str(vlabels) + '\npredictions:' + str(vpreds))
random | color-feature | HoG-feature | VGG16-feature | Resnet50-feature | |
BLEU-1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |