Image Classification
Classify Images and wether or not they have pizza or steak in the photo.
This data will be sourced from the Food-101 dataset, particularly a subdivision of the images that only include pizzas and steaks.
Notebook Goals
- use a pre-built set of images from the web
- build & experiment with machine-learning models
- compare binary-classification against CNN
- Address Over-fitting by utilizing
MaxPoollayers to reduce the number of "features" that the model is dealing with- data augmentation to train the model with images that are "imperfect" to mimic real-world imperfect photos
- shuffling training data: reduce the chances of "learning" made by the order of input data
References
- cnn explainer
- Paper on Accelerating CNN
- ImageNet: an image db
Pre-Built Image-recognition Models
Check out some pre-built models for image recognition:
import zipfile
import os
import pathlib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import random
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPool2D, Activation
from tensorflow.keras import Sequential
import pandas as pdDownload & Inspect Data
# Download zip file of pizza_steak images
fileName = 'pizza_steak.zip'
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/pizza_steak.zip
# Unzip the downloaded file
zip_ref = zipfile.ZipFile(fileName, "r")
zip_ref.extractall()
zip_ref.close()Inspect The Data
The data is in a dir, pizza_steak.
The dir has 2 subdirs, test and train.
Each subdir has 2 subdirs, pizza and steak.
Each pizza and steak dir has images included.
!ls pizza_steak/train/steak#
# SUMMARY OF DATA
#
parentDir = 'pizza_steak'
# Walk through pizza_steak directory and list number of files
for dirpath, dirnames, filenames in os.walk(parentDir):
if(len(filenames) > 0):
print(f" {len(filenames)} images in '{dirpath}'.")
else:
print(f"DIR: '{dirpath}' has {len(dirnames)} dirs")#
# GET CLASS NAMES
#
cleanPath = f'{parentDir}/train/'
trainingPath = pathlib.Path(cleanPath)
classNames = sorted([item.name for item in trainingPath.glob('*')])
npClassNames = np.array(classNames) # created a list of class_names from the subdirectories
print(npClassNames)Preview Some Images
def view_random_image(target_dir, target_class):
# Setup target directory (we'll view images from here)
target_folder = target_dir+target_class
# Get a random image path
random_image = random.sample(os.listdir(target_folder), 1)
# Read in the image and plot it using matplotlib
img = mpimg.imread(target_folder + "/" + random_image[0])
plt.imshow(img)
plt.title(target_class)
plt.axis("off");
print(f"Image shape: {img.shape}") # show the shape of the image
return imgimg = view_random_image(target_dir=cleanPath,
target_class="steak")imgprint(f'img shape: {img.shape}')Key Points
- the data is a bunch of images
- the images are split into directories: test & train, then by classification (2 classifications)
- the shape of the images are 512x512 with a 3-color representation per pixel (probably rgb)
- it has become common to reshape the images to fit a
224x224size - the rgb values
- fit between 0-255, where 0 is "black" and 255 is "white"
- 1st digit is
red - 2nd digit is
green - 3rd digit is
blue
- it has become common to reshape the images to fit a
Build A Model: CNN
About Convolutional Neural Networks
Parts of a CNN:
- input (images)
- LAYERS & related details
- input layer: batch_size, img dimensions, classification mode
- convolution layer: figures out "the most important" patterns to learn,
Conv2D - hidden activation: add "non-linearity" to learned features, most typically
relu - pooling layer: reduces diemsions of learned features,
AvgPool2d,MaxPool2D - "fully connected" layer: , a "last step" aggregating / refining the convolution layers
Dense - output layer: fits to the desired number of "classes" to learn
- output activation add non-linearity to the output layer,
sigmoidorsoftmax
A typical CNN structure:
Input -> Conv + ReLU layers (non-linearities) -> Pooling layer -> Fully connected (dense layer) as Output
Prep the model Data
- get data into "training" and "testing" datasets
- "batch" the data: sub-sets of data to minimize the data loaded into memory (GPU or CPU) at once
- scale the image data-values to be between 0-to-1
# Set the seed
tf.random.set_seed(42)
imgW = 224
imgH = 224
maxScaleNumber = 255
#
# batch_size: limits amount of data in memory at once (in batches!)
# 32 has become a regular starting place in the machine-learning world
#
imagesInABatch = 32
# Preprocess data (get all of the pixel values between 1 and 0, also called scaling/normalization)
# rescaling normalizes 0-255 values to 0-1
# ImageDataGenerator DOCS (a lot there)
# https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1./maxScaleNumber)
valid_datagen = ImageDataGenerator(rescale=1./maxScaleNumber)
# Setup the train and test directories
train_dir = "pizza_steak/train/"
test_dir = "pizza_steak/test/"
#
# BATCH the data
#
train_data = train_datagen.flow_from_directory(train_dir,
batch_size=imagesInABatch, # number of images to process at a time
target_size=(imgW, imgH), # convert all images to be 224 x 224
class_mode="binary", # type of problem we're working on
seed=42)
valid_data = valid_datagen.flow_from_directory(test_dir,
batch_size=imagesInABatch,
target_size=(imgW, imgH),
class_mode="binary",
seed=42)
# for later modeling
test_data = valid_datagen.flow_from_directory(test_dir,
batch_size=imagesInABatch,
target_size=(imgW, imgH),
class_mode="binary",
seed=42)print(f'how many items in train_data? {len(train_data)}')
print(f'how many items in the firt element of train_data? {len(train_data[0])}')Inspect some training data
# Get a sample of the training data batch
images, labels = train_data.next() # get the 'next' batch of images/labels
print(f'images:{len(images)}, labels:{len(labels)}')
# NOTICE LABELS:
# 0 or 1
print('labels:')
labelsBuild the Model
# Create a CNN model (same as Tiny VGG - https://poloclub.github.io/cnn-explainer/)
imageW = 224
imageH = 224
imageColorCount = 3
convoFilterCount = 10
convoKernelCount = 3
maxPoolSize = 2
m1 = tf.keras.models.Sequential([
Conv2D(filters=convoFilterCount,
kernel_size=convoKernelCount, # can also be (3, 3)
activation="relu",
input_shape=(imageW, imageH, imageColorCount)), # first layer specifies input shape (height, width, colour channels)
Conv2D(convoFilterCount, convoKernelCount, activation="relu"),
MaxPool2D(pool_size=maxPoolSize, # pool_size can also be (2, 2)
padding="valid"), # padding can also be 'same'
Conv2D(convoFilterCount, convoKernelCount, activation="relu"),
Conv2D(convoFilterCount, convoKernelCount, activation="relu"), # activation='relu' == Activations(tf.nn.relu)
MaxPool2D(maxPoolSize),
Flatten(),
Dense(1, activation="sigmoid") # binary activation output
])
# Compile the model
m1.compile(loss="binary_crossentropy",
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
# Fit the model
m1History = m1.fit(train_data,
epochs=5,
steps_per_epoch=len(train_data),
validation_data=valid_data,
validation_steps=len(valid_data))Inspect The Model Results
# Check out model_3 architecture
m1.summary()Build A Model II: Binary Classification
Binary classification models are significantly "simpler" than CNNs.
Let's see how a Binary CLassification model
Modeling steps
- Familiarize with the data (visualize, visualize, visualize...)
- "Preprocess" the data (prepare it for a model)
- Create a model (start with a baseline)
- Fit the model
- Evaluate the model
- Adjust different parameters and improve model (try to beat your baseline)
- Repeat Evaluate & Adjust until satisfied
tf.random.set_seed(42)
# Create a model to replicate the TensorFlow Playground model
m2 = tf.keras.Sequential([
Flatten(input_shape=(imgW, imgH, 3)), # dense layers expect a 1-dimensional vector as input
Dense(4, activation='relu'),
Dense(4, activation='relu'),
Dense(1, activation='sigmoid')
])
# Compile the model
m2.compile(loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
# Fit the model
m2History = m2.fit(train_data, # use same training data created above
epochs=5,
steps_per_epoch=len(train_data),
validation_data=valid_data, # use same validation data created above
validation_steps=len(valid_data))Inspect the Model
m1.summary()
m2.summary()Build A Model III: Binary Adjusted
# Set random seed
tf.random.set_seed(42)
# Create a model similar to model_1 but add an extra layer and increase the number of hidden units in each layer
m3 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(imgW, imgH, 3)), # dense layers expect a 1-dimensional vector as input
tf.keras.layers.Dense(100, activation='relu'), # increase number of neurons from 4 to 100 (for each layer)
tf.keras.layers.Dense(100, activation='relu'),
tf.keras.layers.Dense(100, activation='relu'), # add an extra layer
tf.keras.layers.Dense(1, activation='sigmoid')
])
# Compile the model
m3.compile(loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
# Fit the model
m3History = m3.fit(train_data,
epochs=5,
steps_per_epoch=len(train_data),
validation_data=valid_data,
validation_steps=len(valid_data))Inspect The Model
m2.summary()
m3.summary()Interestingly
- the binary classification models have WAY MORE PARAMETERS than the CNN
Build A Model: CNN "Baseline"
Setup a "simple" model to start with:
2Dlayers refer to the data have 2 "dimensions": height + width (color is a data attribute of each height/width pixel)filters: the number of "feature extractions", or "filters", that get "passed over" input tensors (10,32,64,128). The higher the number, the more complex the model.kernel_sizeis describes the shape of a grid of pixels of the filter. The smaller, the more "fine-grained" the feature detection / filter will bestride: describes the movement of a kernel across the image (in pixels-per-stride)padding: to cut-off or not pixels when thefiltermay not cover pixels. a 224w image with a 3x3 filter will leave a few pixels un"filtered", as a 3px-wide filter will cover 222 pixels by moving 74xfeaturesin cnn are "significant" parts of images that the CNN has figured out
Build
m4 = Sequential([
Conv2D(filters=10,
kernel_size=3,
strides=1,
padding='valid',
activation='relu',
input_shape=(224, 224, 3)), # input layer (specify input shape)
Conv2D(10, 3, activation='relu'),
Conv2D(10, 3, activation='relu'),
Flatten(),
Dense(1, activation='sigmoid') # output layer (specify output shape)
])Compile
# Compile the model
m4.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])# Fit the model
m4History = m4.fit(train_data,
epochs=5,
steps_per_epoch=len(train_data),
validation_data=test_data,
validation_steps=len(test_data))Inspect & Compare
# m2.summary()
# m3.summary()
m4.summary() Evaluate
Visualze CNN Model Stats
pd.DataFrame(m4History.history).plot(figsize=(10, 7));# Plot the validation and training data separately
def plot_loss_curves(history):
"""
Returns separate loss curves for training and validation metrics.
"""
loss = history.history['loss']
val_loss = history.history['val_loss']
accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
epochs = range(len(history.history['loss']))
# Plot loss
plt.plot(epochs, loss, label='training_loss')
plt.plot(epochs, val_loss, label='val_loss')
plt.title('Loss')
plt.xlabel('Epochs')
plt.legend()
# Plot accuracy
plt.figure()
plt.plot(epochs, accuracy, label='training_accuracy')
plt.plot(epochs, val_accuracy, label='val_accuracy')
plt.title('Accuracy')
plt.xlabel('Epochs')
plt.legend();# Check out the loss curves of model_4
plot_loss_curves(m4History)Beware Overfitting
Above, the val_loss goes UP after the 3rd epoch. INCREASING loss means over-fitting. Over-fitting is when the model gets excellent at predicting based on the data it was trained & tested with, BUT will loose the ability to predict NEW input as well.
Overfitting happens when...
- a "large" number of convolutional layers is present
- a "large" number of convolutional filters is present
- the "shape" of the accuracy curve-over-epochs has changed from going up to flat &/or going down
- the "shape" of the loss curve-over-epochs has "flattened out" from going down
Adjust The Model
- build
- overfit
- reduce overfitting (by a few approaches):
- adjust (reduce) number of convolutional layers
- adjust number of convolutional filters
- add dense layer to the output of the flattened layer
Here, we'll add a MaxPool2D layer after each convolutional layer.
Build
m5 = Sequential([
# convo-then-maxPool
Conv2D(10, 3, activation='relu', input_shape=(224, 224, 3)),
MaxPool2D(pool_size=2), # reduce number of features by half
# convo-then-maxPool
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
# convo-then-maxPool
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Flatten(),
Dense(1, activation='sigmoid')
])Compile
m5.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])Fit
# Fit the model
m5History = m5.fit(train_data,
epochs=5,
steps_per_epoch=len(train_data),
validation_data=test_data,
validation_steps=len(test_data))Evaluate
Plot The Loss & Accuracy Curves
See the "loss" as epochs increase. Expecting loss to go down while epochs go on. When the validation loss starts to increase, the model is probably over-fitting. See the "accuracy" as epochs increase. Expecting accuracy to increase as epochs go on.
plot_loss_curves(m5History)View Model Stats
m5.summary()Build A Model: Data Augmentation
Data Augmentation
- alter training data
- give training data more "diversity", helping "generalize" patterns for the model to learn
- i.e rotating, flipping, cropping, etc
- help prevent over-fitting: force the model to "learn" from "imperfect" &/or "augmented" images, mimicing real-world "new" images that the model has not-yet seen
- NOTE: testing will be done on (regular) non-augmented images
Prep Data
# Create ImageDataGenerator training instance with data augmentation
train_datagen_augmented = ImageDataGenerator(rescale=1/255.,
rotation_range=20, # rotate the image slightly between 0 and 20 degrees (note: this is an int not a float)
shear_range=0.2,
zoom_range=0.2,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
#
# will re-use "train_datagen" from above
#
# Create ImageDataGenerator test instance without data augmentation
test_datagen = ImageDataGenerator(rescale=1/255.)
print("Augmented training images")
train_data_augmented = train_datagen_augmented.flow_from_directory(train_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary',
shuffle=False)
print("Non-augmented training images:")
augmented_train_data = train_datagen.flow_from_directory(train_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary',
shuffle=False) # Don't shuffle for demonstration purposes
print("Unchanged test images:")
augmented_test_data = test_datagen.flow_from_directory(test_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary')Preview Some "augmented" Images
# get data to preview
images, labels = augmented_train_data.next()
augmented_images, augmented_labels = train_data_augmented.next() # Note: labels aren't augmented, they stay the samerandom_number = random.randint(0, 31) # we're making batches of size 32, so we'll get a random instance
#
# Show original image and augmented image
#
plt.imshow(images[random_number])
plt.title(f"Original")
plt.axis(False)
plt.figure()
plt.imshow(augmented_images[random_number])
plt.title(f"Augmented")
plt.axis(False);Build The Model
#
# mostly the SAME as m5, BUT using the augmented training data
#
m6 = Sequential([
# conv-then-max-pool
Conv2D(10, 3, activation='relu', input_shape=(224, 224, 3)),
MaxPool2D(pool_size=2), # reduce number of features by half
# conv-then-max-pool
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
# conv-then-max-pool
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Flatten(),
Dense(1, activation='sigmoid')
])
# Compile the model
m6.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
# Fit the model
# NOTE: TESTING on non-augmented data
m6History = m6.fit(train_data_augmented, # changed to augmented training data
epochs=5,
steps_per_epoch=len(train_data_augmented),
validation_data=test_data,
validation_steps=len(test_data))Inspect Model
- the accuracy of
m6,.62..., is lower thanm5
visualise loss & accuracy
plot_loss_curves(m6History)Build A Model: Augmented AND shuffled
Shuffling the training data can be one way to reduce any learning influenced by the order of the training data.
Build & Compile
#
# This is ALMOST identical to the above augmentation
# BUT shuffle is set to TRUE
#
train_data_augmented_shuffled = train_datagen_augmented.flow_from_directory(train_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary',
shuffle=True) # Shuffle data (default)# Create the model (same as model_5 and model_6)
m7 = Sequential([
Conv2D(10, 3, activation='relu', input_shape=(224, 224, 3)),
MaxPool2D(),
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Flatten(),
Dense(1, activation='sigmoid')
])
# Compile the model
m7.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
# Fit the model
m7History = m7.fit(train_data_augmented_shuffled, # now the augmented data is shuffled
epochs=5,
steps_per_epoch=len(train_data_augmented_shuffled),
validation_data=test_data,
validation_steps=len(test_data))Inspect Model
- the accuracy of
m7,.76..., compared to the "baseline" m4 at.95. the baseline continues to be the best. very interesting.
visualise loss & accuracy
plot_loss_curves(m7History)The shapes of the loss-curve & accuracy curve look better than the previous model.
Build A Model: Tiny VGG Influence
One way to go about this is to find already-existing model architectures. Models have already been developed for these types of goals, and their model architectures may be available to find online.
The CNN Explainer Website uses a Tiny VGG architecture (some code based on the architecture here). Here, a model based on that architecture, including augmented and shuffled training data.
Build
# Create a CNN model (same as Tiny VGG but for binary classification - https://poloclub.github.io/cnn-explainer/ )
m8 = Sequential([
Conv2D(10, 3, activation='relu', input_shape=(224, 224, 3)), # same input shape as our images
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Conv2D(10, 3, activation='relu'),
Conv2D(10, 3, activation='relu'),
MaxPool2D(),
Flatten(),
Dense(1, activation='sigmoid')
])
# Compile the model
m8.compile(loss="binary_crossentropy",
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
# Fit the model
m8History = m8.fit(train_data_augmented_shuffled,
epochs=5,
steps_per_epoch=len(train_data_augmented_shuffled),
validation_data=test_data,
validation_steps=len(test_data))Inspect
- the accuracy of
m8,.77..., compared to the "baseline" m4 at.95. the baseline continues to be the best. very interesting.
visualise loss & accuracy
plot_loss_curves(m8History)plot_loss_curves(m4History)m8History.history['loss']Predicting Images with the best model
Get An Image
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-steak.jpeg
steak_to_predict = mpimg.imread("03-steak.jpeg")
plt.imshow(steak_to_predict)
plt.axis(False);steak_to_predict.shapePrepare Image For the Model
Here, a function to help prep images:
- read an image from the file-system
- convert the image to a tensor, including the expected number of color channels (3)
- resize the img
- scale the image tensor values
# import, translate to tensor, resize & rescale
def load_and_prep_image(filename, img_shape=224):
# Read in target file (an image)
img = tf.io.read_file(filename)
# Decode the read file into a tensor & ensure 3 colour channels
# (our model is trained on images with 3 colour channels and sometimes images have 4 colour channels)
img = tf.image.decode_image(img, channels=3)
# Resize the image (to the same size our model was trained on)
img = tf.image.resize(img, size = [img_shape, img_shape])
# Rescale the image (get all values between 0 and 1)
img = img/255.
return img# Load in and preprocess our custom image
preppedSteakImg = load_and_prep_image("03-steak.jpeg")
preppedSteakImg#
# function to convert a predicted value, between 0-1, to the classification
# AND plot the image on a visual & show the prediction
#
def pred_and_plot(model, filename, class_names):
"""
Imports an image located at filename, makes a prediction on it with
a trained model and plots the image with the predicted class as the title.
"""
# Import the target image and preprocess it
img = load_and_prep_image(filename)
# Make a prediction
pred = model.predict(tf.expand_dims(img, axis=0))
# Get the predicted class
pred_class = class_names[int(tf.round(pred)[0][0])]
# Plot the image and predicted class
plt.imshow(img)
plt.title(f"Prediction: {pred_class}")
plt.axis(False);Predict
m8.predict(preppedSteakImg, axis=0)The prediction error here is due to a shape mismatch between the image being predicted and the TRAINED image shapes.
preppedSteakImg.shapetrain_data_augmented_shuffled[0][0].shapethe trained image shape is (32, 224, 224, 3) and the predicted image shape is 224, 224, 3.
The difference, there, is that the first number in the trained images is 32, which just-so-happens (not a coincidence) to be the batch number.
In order to get the predicted image shape to match the trained image shape, the expand_dims function can be used:
shapedPredictionImage = tf.expand_dims(preppedSteakImg, axis=0)
shapedPredictionImage.shapePredict Again
m8.predict(shapedPredictionImage)# def pred_and_plot(model, filename, class_names)
pred_and_plot(m8, '03-steak.jpeg', npClassNames)!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-pizza-dad.jpeg
pred_and_plot(m8, "03-pizza-dad.jpeg", npClassNames)