Transfer Learning & Experiments
Experiment by creating a few models:
m0: Use feature extraction transfer learning on10%of the training datam1: Use feature extraction transfer learning on1%of the training data- with data augmentation
m2: Use feature extraction transfer learning on10%of the training data- with data augmentation
- save the results to a checkpoint
m3: Fine-tune the m2 checkpoint on10%of the training data- with data augmentation
m4: Fine-tune the m2 checkpoint on100%of the training data- with data augmentation
Notebook Goals
- build at least 4 models, running at least 4 experiments on the data (see above)
- Compare the impact on model performance of 2 variables:
- amount of training data
- data augmentation
- Build a fn to augment-and-plot random images, comparing augmented-to-regular (for visual review)
- Use weight "checkpoints" to save model weights and build models from saved weights
- experiment with model variables:
- amount of training data
- number of trainable layers in a "base" model
- number of epochs
Imports
import tensorflow as tf
from keras.callbacks import CSVLogger
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
import random
import numpy as np## Download Helper Functions
# Download helper_functions.py script
# !wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
# Import helper functions we're going to use
from helper_functions import create_tensorboard_callback, plot_loss_curves, unzip_data, walk_through_dirGet Data
#
# 10% set of data based on the food101 dataset
#
# !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip
# unzip_data("10_food_classes_10_percent.zip")
#
# 1%
#
# !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_1_percent.zip
# unzip_data("10_food_classes_1_percent.zip")
#
# ALL data
#
# !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_all_data.zip
# unzip_data("10_food_classes_all_data.zip")
Preview the 1% data
7images per class in training250images per class in testing
walk_through_dir("10_food_classes_1_percent")Data & Var Prep for multi-model experimentation
#
# 10% of the data
#
data_dir_path_10p = "10_food_classes_10_percent/"
train_dir_path_10p = data_dir_path_10p + "train/"
test_dir_path_10p = data_dir_path_10p + "test/"
#
# 1% of the data
#
data_dir_path_1p = "10_food_classes_1_percent/"
train_dir_path_1p = data_dir_path_1p + "train/"
test_dir_path_1p = data_dir_path_1p + "test/"
#
# ALL of the data
#
data_dir_path_100p = "10_food_classes_all_data/"
train_dir_path_100p = data_dir_path_100p + "train/"
test_dir_path_100p = data_dir_path_100p + "test/"
# data_dir_path_100p, train_dir_path_100p, test_dir_path_100p
IMG_OUTPUT_SIZE = (224, 224)
labelMode = "categorical"
# batchSize = 32
batchSize = 16Model I: train-on-10%
Split Data: Test & Train
train_data_10p = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir_path_10p,
image_size=IMG_OUTPUT_SIZE,
label_mode=labelMode,
batch_size=batchSize)
test_data_10p = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir_path_10p,
image_size=IMG_OUTPUT_SIZE,
label_mode=labelMode)#
# inspect training data var
#
train_data_10ptrain_data_10p.class_names#
# see ALL methods available on the new vars
#
# dir(train_data_10p)#
# preview some data using the "take" method
#
# for images, labels in train_data_10p.take(1):
# print(images,labels)Build, Compile & Fit
modelName = 'm0'
lessValidationDataCount = int(0.25 * len(test_data_10p))
csv_logger = CSVLogger(f'{modelName}-log.csv', append=True, separator=';')
#
# pre-trained model
#
# 1. Create base model with tf.keras.applications
# https://www.tensorflow.org/api_docs/python/tf/keras/applications/EfficientNetV2B0
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)
# 2. Freeze the base model (so the pre-learned patterns remain)
base_model.trainable = False
#
# custom layer(s)
#
# 3. Create inputLayer into the base model
inputLayer = tf.keras.layers.Input(shape=(224, 224, 3), name="input_layer")
# "4" If using ResNet50V2, add this to speed up convergence by rescaling inputs
# NOT for EfficientNetV2
# appliedModel = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)(inputLayer)
# 5. Apply the inputLayer to the base_model (note: using tf.keras.applications, EfficientNetV2 inputLayer don't have to be normalized)
appliedModel = base_model(inputLayer)
# Check data shape after passing it to base_model
print(f"Shape after base_model: {appliedModel.shape}")
# 6. Average pool the outputs of the base model (aggregate all the most important information, reduce number of computations)
appliedModel = tf.keras.layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(appliedModel)
print(f"After GlobalAveragePooling2D(): {appliedModel.shape}")
# 7. Create the output activation layer
outputs = tf.keras.layers.Dense(10, activation="softmax", name="output_layer")(appliedModel)
# 8. Combine the inputLayer with the outputs into a model
m0 = tf.keras.Model(inputLayer, outputs)
# 9. Compile the model
m0.compile(loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
# 10. Fit the model (we use less steps for validation so it's faster)
m0History = m0.fit(train_data_10p,
epochs=5,
steps_per_epoch=len(train_data_10p),
validation_data=test_data_10p,
# Go through LESS of the validation data so epochs are faster (we want faster experiments!)
validation_steps=lessValidationDataCount,
# Track our model's training logs for visualization later
callbacks=[create_tensorboard_callback("transfer_learning", "10p_feature_extract"), csv_logger])Inspect Model
There is a base_model which is JUST the "starting place".
There is also the m0, which is the transfer-learned model including our data.
Summary
base_model.summary()m0.summary()Visualize Loss & Accuracy Curves
plot_loss_curves(m0History)Model II: train-on-1%-with-aug
Split Data: Test & Train
train_data_1p = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir_path_1p,
image_size=IMG_OUTPUT_SIZE,
label_mode=labelMode,
batch_size=batchSize)
test_data_1p = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir_path_1p,
image_size=IMG_OUTPUT_SIZE,
label_mode=labelMode)Augment The Training Data
Using a Sequential keras model, a composed augmentation layer will be made. The augmentation layer will be made of several "inner" layers. The "inner" layers are the layers that will "augment" the data, as described by each layer:
augmentationLayer = keras.Sequential([
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.2),
layers.RandomZoom(0.2),
layers.RandomHeight(0.2),
layers.RandomWidth(0.2),
# preprocessing.Rescaling(1./255) # keep for ResNet50V2, remove for EfficientNetV2B0
], name ="data_augmentation")
# # NOTE:
# Previous versions of TensorFlow (e.g. 2.4 and below) used
# tensorflow.keras.layers.experimental.processing:
# augmentationLayer = keras.Sequential([
# preprocessing.RandomFlip("horizontal"),
# preprocessing.RandomRotation(0.2),
# preprocessing.RandomZoom(0.2),
# preprocessing.RandomHeight(0.2),
# preprocessing.RandomWidth(0.2),
# # preprocessing.Rescaling(1./255) # keep for ResNet50V2, remove for EfficientNetV2B0
# ], name ="data_augmentation")Visualize Some Original & Augmented Images
target_class = random.choice(train_data_1p.class_names) # choose a random class
target_dir = "10_food_classes_1_percent/train/" + target_class # create the target directory
random_image = random.choice(os.listdir(target_dir)) # choose a random image from target directory
random_image_path = target_dir + "/" + random_image # create the choosen random image path
img = mpimg.imread(random_image_path) # read in the chosen target image
plt.imshow(img) # plot the target image
plt.title(f"Original random image from class: {target_class}")
plt.axis(False); # turn off the axes
# Reshape, Augment, and ReNormalize
imgWithNewShape = tf.expand_dims(img, axis=0)
augmented_img = augmentationLayer(imgWithNewShape) # data augmentation model requires shape (None, height, width, 3)
normalizedAugmentedImg = tf.squeeze(augmented_img)/255. # requires normalization after augmentation
plt.figure()
plt.imshow(normalizedAugmentedImg)
plt.title(f"Augmented random image from class: {target_class}")
plt.axis(False);Build, Compile, Fit
# RE-Using inputLayer var from above
# Add in data augmentation Sequential model as a layer
applied1pAugModel = augmentationLayer(inputLayer)
# Give base_model inputLayer (after augmentation) and don't train it
applied1pAugModel = base_model(applied1pAugModel, training=False)
# Pool output features of base model
applied1pAugModel = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(applied1pAugModel)
# Put a dense layer on as the output
applied1pAugOutput = layers.Dense(10, activation="softmax", name="output_layer")(applied1pAugModel)
# Make a model with "inputs" and "outputs"
m1 = keras.Model(inputLayer, applied1pAugOutput)
# Compile the model
m1.compile(loss="categorical_crossentropy",
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
# Fit the model
history_1p = m1.fit(train_data_1p,
epochs=5,
steps_per_epoch=len(train_data_1p),
validation_data=test_data_1p,
validation_steps=int(0.25* len(test_data_1p)), # validate for less steps
# Track model training logs
callbacks=[create_tensorboard_callback("transfer_learning", "1p_data_aug")])Model Comparison
- ACCURACY:
m0hasval_accuracyof~88%m1, with augmented data, hasval_accuracyof~48%m0has significantly higher accuracy
- LOSS CURVE
m0has a "nicer" loss curve epoch-to-epohc
Inspect Model
Summary
m1.summary()m1.evaluate(test_data_1p)Visualize Loss & Accuracy Curves
plot_loss_curves(history_1p)Model III: train-on-10%-with-aug
This uses a few of the same variables set above used in m0, as that model also used 10% of the food101 dataset.
This also uses data augmentation.
This will also save the model, via a "checkpoint", with the help of the tensorflow method tf.keras.callbacks.ModelCheckpoint.
Build, Compile
appliedM210pAug = augmentationLayer(inputLayer) # augment our training images
# training=False: https://keras.io/guides/transfer_learning/#build-a-model
# pass augmented images to base model but keep it in inference mode, so batchnorm layers don't get updated
appliedM210pAug = base_model(appliedM210pAug, training=False)
appliedM210pAug = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(appliedM210pAug)
appliedM2Outputs = layers.Dense(10, activation="softmax", name="output_layer")(appliedM210pAug)
m2 = tf.keras.Model(inputLayer, appliedM2Outputs)
# Compile
m2.compile(loss="categorical_crossentropy",
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), # use Adam optimizer with base learning rate
metrics=["accuracy"])Create New Callback: ModelCheckpoint
save the model OR model weights at a given frequency.
Saved checkpoints can be re-loaded "later" and used in later model development.
savedCheckpointPath = "m2_10p_checkpoints_weights/checkpoint.ckpt"
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=savedCheckpointPath,
save_weights_only=True, # set to False to save the entire model
save_best_only=True, # save only the best model weights instead of a model every epoch
save_freq="epoch", # save every epoch
verbose=1)Fit
m2History = m2.fit(train_data_10p,
epochs=5,
validation_data=test_data_10p,
validation_steps=int(0.25 * len(test_data_10p)), # do less steps per validation (quicker)
callbacks=[create_tensorboard_callback("transfer_learning", "m2_10p_aug"),
checkpoint_callback])Summarize, Visualize, Inspect
m2_10p_aug_evaluated = m2.evaluate(test_data_10p)
m2_10p_aug_evaluatedplot_loss_curves(m2History)m2.summary()Deep Inspection: Layers
m2.layersfor layer_number, layer in enumerate(m2.layers):
print(f"Layer NUMBER: {layer_number} \t| NAME: {layer.name} \t| TYPE: {layer} \t| Trainable? {layer.trainable}")Load A Model From Checkpointed Weights
Loading a model from a saved checkpoint
Model m3 used the callback that saves weights to a checkpoint.
The saved checkpoint can be used...
- reload the weights from the file
- evaluate the model
- compare SAVED weights to the NEW weights after using the model to evaluate on data
#
# load weights into a model
#
m2.load_weights(savedCheckpointPath)#
# evaluate the model against test data AFTER usings the checkpointd weights
#
m2WithLoadedWeights = m2.evaluate(test_data_10p)Compare Model With And Without Checkpointed Weights
#
# COMPARE evaluate results:
# WITHOUT loaded weights
# WITH loaded weights
#
m2WithLoadedWeightsm2_10p_aug_evaluated# Check to see if loaded model results are very close to native model results (should output True)
np.isclose(np.array(m2WithLoadedWeights), np.array(m2_10p_aug_evaluated))# Check the difference between the two results (small values)
print(np.array(m2WithLoadedWeights) - np.array(m2_10p_aug_evaluated))Model IV: Fine-Tuning Model II
"UnFreezing" some layers in the pre-trained model.
A workflow for fine-tuning:
- build a feature-extracted model
- train the weights in the output layer
- THEN un-freeze some layers and "work backwards" to unfreee more and more layers
How many layers should be "un-frozen" in a "base" pre-trained model?
There may not be a "consensus" on this topic.
Inspecting Layers & Base Model
- see the layers in
m2 - figure out which layers are currently
trainableinm2
m2.layersfor layer_number, layer in enumerate(m2.layers):
print(f"Layer #{layer_number}\n\tNAME: {layer.name}\n\t TYPE: {layer}\n\tTrainable? {layer.trainable}")- the
efficientnetv2-b0model is the model we are most interested in "unf-freezing" - the
efficientnetv2-b0model is, itself, a layer in our model - the
efficientnetv2-b0model is layer #2
m2BaseModel = m2.layers[2]
print(f'm2BaseModel: {m2BaseModel.name}')# to see ALL OF THE LAYERS IN THAT LAYER...
# for i, lyr in enumerate(m3.layers[2].layers):
# print(i,lyr.name,lyr.trainable)
# how many TRAINABLE layers in that layer:
print(len(m2BaseModel.trainable_variables))UnFreeze Some (10) Base-Model Layers
This is the beginning of fine-tuning.
Un-Freeze, Retrain, inspect, rinse & repeat.
# make trainable!
m2BaseModel.trainable = True# Re-Freeze all layers EXCEPT FOR the last 10
for layer in m2BaseModel.layers[:-10]:
layer.trainable = False# Check which layers are NOW tuneable/trainable
for layer_number, layer in enumerate(m2BaseModel.layers):
if(layer.trainable == True):
print(f'layer #{layer_number}, {layer.name}, is trainable')Re-Compile The Model
After making a change to the model, the model needs re-compiling.
Layers trainability have been edited, so the impact on training the model may (likely) will be different.
NOTE: in this compilation, the learning-rate will be set to a smaller/more fine-tined value in order to leverage fine-tuning better. Seems like 10x smaller learning-rate is a generally accepted place to start.
# Recompile the whole model (always recompile after any adjustments to a model)
m2.compile(loss="categorical_crossentropy",
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), # lr is 10x lower than before for fine-tuning
metrics=["accuracy"])print(f'm2 now has {len(m2.trainable_variables)} trainable vars')Fit The Model
initialEpochCount = 5
fineTuneEpochCount = initialEpochCount + 5
# Refit the model (same as model_2 except with more trainable layers)
m2FineTuneHistory = m2.fit(train_data_10p,
epochs=fineTuneEpochCount,
validation_data=test_data_10p,
initial_epoch=m2History.epoch[-1], # START from last epoch of previous "m3.fit"
validation_steps=int(0.25 * len(test_data_10p)),
callbacks=[create_tensorboard_callback("transfer_learning", "10p_data_aug_fine_tuned")]) # name experiment appropriatelyEvaluate
m3 model.
fine-tuned for 5 more epochs.
a bunch of "un-frozen" layers in the pre-trained model, re-trained on OUR data!
plot_loss_curves(m2FineTuneHistory)Compare Fine-Tuned to Pre-Fine-Tuned Model
def compare_historys(original_history, new_history, initial_epochs=5):
"""
Compares two model history objects.
"""
# Get original history measurements
acc = original_history.history["accuracy"]
loss = original_history.history["loss"]
print(len(acc))
val_acc = original_history.history["val_accuracy"]
val_loss = original_history.history["val_loss"]
# Combine original history with new history
total_acc = acc + new_history.history["accuracy"]
total_loss = loss + new_history.history["loss"]
total_val_acc = val_acc + new_history.history["val_accuracy"]
total_val_loss = val_loss + new_history.history["val_loss"]
print(len(total_acc))
print(total_acc)
# Make plots
plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(total_acc, label='Training Accuracy')
plt.plot(total_val_acc, label='Validation Accuracy')
plt.plot([initial_epochs-1, initial_epochs-1],
plt.ylim(), label='Start Fine Tuning') # reshift plot around epochs
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(2, 1, 2)
plt.plot(total_loss, label='Training Loss')
plt.plot(total_val_loss, label='Validation Loss')
plt.plot([initial_epochs-1, initial_epochs-1],
plt.ylim(), label='Start Fine Tuning') # reshift plot around epochs
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()compare_historys(original_history=m2History,
new_history=m2FineTuneHistory,
initial_epochs=5)Model V: Fine-Tuning with 100% data
Model 3 is training on 10% of the data.
Model 4 (really an edited model 3) is fine-tuning a pre-trained model, using 10% of the data.
THIS model, model V, will start with the same "base" model that was used in model II, "train-on-10%-with-aug":
- use
load_weightsto get a model setup to be similar to model III - "open up" the weights for trainability
- fine-tune the model using 100% of the data (instead of 10% in the previous model, model IV)
- compare
model IItomodel Vto see what impact fine-tuning &100%of training data will have compared to training on1%of the data with data augmentation
# data_dir_path_100p, train_dir_path_100p, test_dir_path_100p
walk_through_dir("10_food_classes_all_data")Split Data
IMG_SIZE = (224, 224)
train_data_100p = tf.keras.preprocessing.image_dataset_from_directory(train_dir_path_100p,
label_mode="categorical",
image_size=IMG_SIZE)
# Note: this is the same test dataset we've been using for the previous modelling experiments
test_data_100p = tf.keras.preprocessing.image_dataset_from_directory(test_dir_path_100p,
label_mode="categorical",
image_size=IMG_SIZE)Evaluate m2 with all the test data
m2.evaluate(test_data_100p)Build & Compile Model
This is the same exact model config as the previous model.
(this could be converted to a function for further re-usability)
# # Create base model
# m = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)
# m4.trainable = False
# # Setup model input and outputs with data augmentation
# m4InputLayer = layers.Input(shape=(224, 224, 3), name="input_layer")
# appliedM4 = augmentationLayer(m4InputLayer)
# appliedM4 = m4(appliedM4, training=False) # pass augmented images to base model but keep it in inference mode
# appliedM4 = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(appliedM4)
# m4OutputLayer = layers.Dense(units=10, activation="softmax", name="output_layer")(appliedM4)
# m3 = tf.keras.Model(m4InputLayer, m4OutputLayer)
# # Compile
# m4.compile(loss="categorical_crossentropy",
# optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
# metrics=["accuracy"])Load Weights into model
"revert" m2 "back" to its "state" where it was trained on 10% of the data, with augmentation, which is marked as Model II in this doc.
With m2 reverted to that state, this next model, "Model V", can be more clearly compared to Model II.
m2.load_weights(savedCheckpointPath)NOTE: the model cannot be loaded from weights.
The model must be re-compiled THEN weights re-loaded.
Interesting tidbit here.
Re-Compile The Model
m2.compile(loss="categorical_crossentropy",
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), # lr is 10x lower than before for fine-tuning
metrics=["accuracy"])Load From Weights
m2.load_weights(savedCheckpointPath)Re-Evaluate The Model
Now that the model is "back" to is m2 state from Model II, the model should evaluate differently than the model evaluated in Model V.
After evaluating on the same data as model 5, `
m2.evaluate(test_data_100p)accuarcy is 83%, loss is 56%.
model 5 acc was 85 and loss was 44.
print(len(m2.trainable_variables))# Check which layers are NOW tuneable/trainable
for layer_number, layer in enumerate(m2.layers):
if(layer.trainable == True):
print(f'layer #{layer_number}, {layer.name}, is trainable')m2.layers[2].name# Check which layers are NOW tuneable/trainable
for layer_number, layer in enumerate(m2.layers[2].layers):
if(layer.trainable == True):
print(f'layer #{layer_number}, {layer.name}, is trainable')NOTICE: here, 10 layers of the efficientnetv2-b0 model are trainable.
Fit
m2FineTuned100P = m2.fit(train_data_100p,
epochs=10,
initial_epoch=m2History.epoch[-1],
validation_data=test_data_100p,
validation_steps=int(0.25 * len(test_data_100p)),
callbacks=[create_tensorboard_callback("transfer_learning", "m2_100p_fine-tuned")])Evaluate
m5Evaluation = m2.evaluate(test_data_100p)compare_historys(original_history=m2History,
new_history=m2FineTuned100P,
initial_epochs=5)# %load_ext tensorboard
# %tensorboard --logdir="transfer_learning"
# edit: needs port review w. docker....