Transfer Learning & Experiments

Experiment by creating a few models:

m0: Use feature extraction transfer learning on 10% of the training data
m1: Use feature extraction transfer learning on 1% of the training data
- with data augmentation
m2: Use feature extraction transfer learning on 10% of the training data
- with data augmentation
- save the results to a checkpoint
m3: Fine-tune the m2 checkpoint on 10% of the training data
- with data augmentation
m4: Fine-tune the m2 checkpoint on 100% of the training data
- with data augmentation

Notebook Goals

build at least 4 models, running at least 4 experiments on the data (see above)
Compare the impact on model performance of 2 variables:
- amount of training data
- data augmentation
Build a fn to augment-and-plot random images, comparing augmented-to-regular (for visual review)
Use weight "checkpoints" to save model weights and build models from saved weights
experiment with model variables:
- amount of training data
- number of trainable layers in a "base" model
- number of epochs

Imports

In [1]:

import tensorflow as tf
from keras.callbacks import CSVLogger
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
import random
import numpy as np

In [2]:

## Download Helper Functions
# Download helper_functions.py script
# !wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py

# Import helper functions we're going to use
from helper_functions import create_tensorboard_callback, plot_loss_curves, unzip_data, walk_through_dir

Get Data

In [3]:

# 
# 10% set of data based on the food101 dataset
# 

# !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip
# unzip_data("10_food_classes_10_percent.zip")


# 
# 1%
# 
# !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_1_percent.zip
# unzip_data("10_food_classes_1_percent.zip")


# 
# ALL data
# 
# !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_all_data.zip
# unzip_data("10_food_classes_all_data.zip")

Preview the 1% data

7 images per class in training
250 images per class in testing

In [4]:

walk_through_dir("10_food_classes_1_percent")

There are 2 directories and 0 images in '10_food_classes_1_percent'.
There are 10 directories and 0 images in '10_food_classes_1_percent/test'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/ice_cream'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/chicken_curry'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/steak'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/sushi'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/chicken_wings'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/grilled_salmon'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/hamburger'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/pizza'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/ramen'.
There are 0 directories and 250 images in '10_food_classes_1_percent/test/fried_rice'.
There are 10 directories and 0 images in '10_food_classes_1_percent/train'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/ice_cream'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/chicken_curry'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/steak'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/sushi'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/chicken_wings'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/grilled_salmon'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/hamburger'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/pizza'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/ramen'.
There are 0 directories and 7 images in '10_food_classes_1_percent/train/fried_rice'.

Data & Var Prep for multi-model experimentation

In [5]:

# 
# 10% of the data
# 
data_dir_path_10p = "10_food_classes_10_percent/"
train_dir_path_10p = data_dir_path_10p + "train/"
test_dir_path_10p = data_dir_path_10p + "test/"

# 
# 1% of the data
# 
data_dir_path_1p = "10_food_classes_1_percent/"
train_dir_path_1p = data_dir_path_1p + "train/"
test_dir_path_1p = data_dir_path_1p + "test/"


# 
# ALL of the data
# 
data_dir_path_100p = "10_food_classes_all_data/"
train_dir_path_100p = data_dir_path_100p + "train/"
test_dir_path_100p = data_dir_path_100p + "test/"

# data_dir_path_100p, train_dir_path_100p, test_dir_path_100p

IMG_OUTPUT_SIZE = (224, 224)
labelMode = "categorical"
# batchSize = 32
batchSize = 16

Model I: train-on-10%

Split Data: Test & Train

In [6]:

train_data_10p = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir_path_10p,
                                                                            image_size=IMG_OUTPUT_SIZE,
                                                                            label_mode=labelMode,
                                                                            batch_size=batchSize)
test_data_10p = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir_path_10p,
                                                                           image_size=IMG_OUTPUT_SIZE,
                                                                           label_mode=labelMode)

Found 750 files belonging to 10 classes.
Found 2500 files belonging to 10 classes.

In [7]:

# 
# inspect training data var
# 
train_data_10p

Out [7]:

<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 10), dtype=tf.float32, name=None))>

In [8]:

train_data_10p.class_names

Out [8]:

['chicken_curry',
 'chicken_wings',
 'fried_rice',
 'grilled_salmon',
 'hamburger',
 'ice_cream',
 'pizza',
 'ramen',
 'steak',
 'sushi']

In [9]:

# 
# see ALL methods available on the new vars
# 

# dir(train_data_10p)

In [10]:

# 
# preview some data using the "take" method
# 

# for images, labels in train_data_10p.take(1):
#   print(images,labels)

Build, Compile & Fit

In [11]:

modelName = 'm0'
lessValidationDataCount = int(0.25 * len(test_data_10p))
csv_logger = CSVLogger(f'{modelName}-log.csv', append=True, separator=';')

# 
# pre-trained model
# 
# 1. Create base model with tf.keras.applications
# https://www.tensorflow.org/api_docs/python/tf/keras/applications/EfficientNetV2B0
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)
# 2. Freeze the base model (so the pre-learned patterns remain)
base_model.trainable = False

# 
# custom layer(s)
# 
# 3. Create inputLayer into the base model
inputLayer = tf.keras.layers.Input(shape=(224, 224, 3), name="input_layer")


# "4" If using ResNet50V2, add this to speed up convergence by rescaling inputs
# NOT for EfficientNetV2
# appliedModel = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)(inputLayer)


# 5. Apply the inputLayer to the base_model (note: using tf.keras.applications, EfficientNetV2 inputLayer don't have to be normalized)
appliedModel = base_model(inputLayer)

# Check data shape after passing it to base_model
print(f"Shape after base_model: {appliedModel.shape}")

# 6. Average pool the outputs of the base model (aggregate all the most important information, reduce number of computations)
appliedModel = tf.keras.layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(appliedModel)
print(f"After GlobalAveragePooling2D(): {appliedModel.shape}")

# 7. Create the output activation layer
outputs = tf.keras.layers.Dense(10, activation="softmax", name="output_layer")(appliedModel)

# 8. Combine the inputLayer with the outputs into a model
m0 = tf.keras.Model(inputLayer, outputs)

# 9. Compile the model
m0.compile(loss='categorical_crossentropy',
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

# 10. Fit the model (we use less steps for validation so it's faster)
m0History = m0.fit(train_data_10p,
                                 epochs=5,
                                 steps_per_epoch=len(train_data_10p),
                                 validation_data=test_data_10p,
                                 # Go through LESS of the validation data so epochs are faster (we want faster experiments!)
                                 validation_steps=lessValidationDataCount,
                                 # Track our model's training logs for visualization later
                                 callbacks=[create_tensorboard_callback("transfer_learning", "10p_feature_extract"), csv_logger])

Shape after base_model: (None, 7, 7, 1280)
After GlobalAveragePooling2D(): (None, 1280)
Saving TensorBoard log files to: transfer_learning/10p_feature_extract/20240701-112844
Epoch 1/5
47/47 [==============================] - 37s 665ms/step - loss: 1.6693 - accuracy: 0.5000 - val_loss: 1.0827 - val_accuracy: 0.7549
Epoch 2/5
47/47 [==============================] - 35s 761ms/step - loss: 0.9051 - accuracy: 0.7933 - val_loss: 0.7201 - val_accuracy: 0.8372
Epoch 3/5
47/47 [==============================] - 29s 623ms/step - loss: 0.6616 - accuracy: 0.8533 - val_loss: 0.6139 - val_accuracy: 0.8520
Epoch 4/5
47/47 [==============================] - 29s 624ms/step - loss: 0.5344 - accuracy: 0.8760 - val_loss: 0.5428 - val_accuracy: 0.8553
Epoch 5/5
47/47 [==============================] - 29s 617ms/step - loss: 0.4698 - accuracy: 0.8947 - val_loss: 0.5007 - val_accuracy: 0.8701

Inspect Model

There is a base_model which is JUST the "starting place".
There is also the m0, which is the transfer-learned model including our data.

Summary

In [12]:

base_model.summary()

Model: "efficientnetv2-b0"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_1 (InputLayer)        [(None, None, None, 3)]      0         []                            
                                                                                                  
 rescaling (Rescaling)       (None, None, None, 3)        0         ['input_1[0][0]']             
                                                                                                  
 normalization (Normalizati  (None, None, None, 3)        0         ['rescaling[0][0]']           
 on)                                                                                              
                                                                                                  
 stem_conv (Conv2D)          (None, None, None, 32)       864       ['normalization[0][0]']       
                                                                                                  
 stem_bn (BatchNormalizatio  (None, None, None, 32)       128       ['stem_conv[0][0]']           
 n)                                                                                               
                                                                                                  
 stem_activation (Activatio  (None, None, None, 32)       0         ['stem_bn[0][0]']             
 n)                                                                                               
                                                                                                  
 block1a_project_conv (Conv  (None, None, None, 16)       4608      ['stem_activation[0][0]']     
 2D)                                                                                              
                                                                                                  
 block1a_project_bn (BatchN  (None, None, None, 16)       64        ['block1a_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block1a_project_activation  (None, None, None, 16)       0         ['block1a_project_bn[0][0]']  
  (Activation)                                                                                    
                                                                                                  
 block2a_expand_conv (Conv2  (None, None, None, 64)       9216      ['block1a_project_activation[0
 D)                                                                 ][0]']                        
                                                                                                  
 block2a_expand_bn (BatchNo  (None, None, None, 64)       256       ['block2a_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block2a_expand_activation   (None, None, None, 64)       0         ['block2a_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block2a_project_conv (Conv  (None, None, None, 32)       2048      ['block2a_expand_activation[0]
 2D)                                                                [0]']                         
                                                                                                  
 block2a_project_bn (BatchN  (None, None, None, 32)       128       ['block2a_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block2b_expand_conv (Conv2  (None, None, None, 128)      36864     ['block2a_project_bn[0][0]']  
 D)                                                                                               
                                                                                                  
 block2b_expand_bn (BatchNo  (None, None, None, 128)      512       ['block2b_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block2b_expand_activation   (None, None, None, 128)      0         ['block2b_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block2b_project_conv (Conv  (None, None, None, 32)       4096      ['block2b_expand_activation[0]
 2D)                                                                [0]']                         
                                                                                                  
 block2b_project_bn (BatchN  (None, None, None, 32)       128       ['block2b_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block2b_drop (Dropout)      (None, None, None, 32)       0         ['block2b_project_bn[0][0]']  
                                                                                                  
 block2b_add (Add)           (None, None, None, 32)       0         ['block2b_drop[0][0]',        
                                                                     'block2a_project_bn[0][0]']  
                                                                                                  
 block3a_expand_conv (Conv2  (None, None, None, 128)      36864     ['block2b_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block3a_expand_bn (BatchNo  (None, None, None, 128)      512       ['block3a_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block3a_expand_activation   (None, None, None, 128)      0         ['block3a_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block3a_project_conv (Conv  (None, None, None, 48)       6144      ['block3a_expand_activation[0]
 2D)                                                                [0]']                         
                                                                                                  
 block3a_project_bn (BatchN  (None, None, None, 48)       192       ['block3a_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block3b_expand_conv (Conv2  (None, None, None, 192)      82944     ['block3a_project_bn[0][0]']  
 D)                                                                                               
                                                                                                  
 block3b_expand_bn (BatchNo  (None, None, None, 192)      768       ['block3b_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block3b_expand_activation   (None, None, None, 192)      0         ['block3b_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block3b_project_conv (Conv  (None, None, None, 48)       9216      ['block3b_expand_activation[0]
 2D)                                                                [0]']                         
                                                                                                  
 block3b_project_bn (BatchN  (None, None, None, 48)       192       ['block3b_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block3b_drop (Dropout)      (None, None, None, 48)       0         ['block3b_project_bn[0][0]']  
                                                                                                  
 block3b_add (Add)           (None, None, None, 48)       0         ['block3b_drop[0][0]',        
                                                                     'block3a_project_bn[0][0]']  
                                                                                                  
 block4a_expand_conv (Conv2  (None, None, None, 192)      9216      ['block3b_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block4a_expand_bn (BatchNo  (None, None, None, 192)      768       ['block4a_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block4a_expand_activation   (None, None, None, 192)      0         ['block4a_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block4a_dwconv2 (Depthwise  (None, None, None, 192)      1728      ['block4a_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block4a_bn (BatchNormaliza  (None, None, None, 192)      768       ['block4a_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block4a_activation (Activa  (None, None, None, 192)      0         ['block4a_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block4a_se_squeeze (Global  (None, 192)                  0         ['block4a_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block4a_se_reshape (Reshap  (None, 1, 1, 192)            0         ['block4a_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block4a_se_reduce (Conv2D)  (None, 1, 1, 12)             2316      ['block4a_se_reshape[0][0]']  
                                                                                                  
 block4a_se_expand (Conv2D)  (None, 1, 1, 192)            2496      ['block4a_se_reduce[0][0]']   
                                                                                                  
 block4a_se_excite (Multipl  (None, None, None, 192)      0         ['block4a_activation[0][0]',  
 y)                                                                  'block4a_se_expand[0][0]']   
                                                                                                  
 block4a_project_conv (Conv  (None, None, None, 96)       18432     ['block4a_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block4a_project_bn (BatchN  (None, None, None, 96)       384       ['block4a_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block4b_expand_conv (Conv2  (None, None, None, 384)      36864     ['block4a_project_bn[0][0]']  
 D)                                                                                               
                                                                                                  
 block4b_expand_bn (BatchNo  (None, None, None, 384)      1536      ['block4b_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block4b_expand_activation   (None, None, None, 384)      0         ['block4b_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block4b_dwconv2 (Depthwise  (None, None, None, 384)      3456      ['block4b_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block4b_bn (BatchNormaliza  (None, None, None, 384)      1536      ['block4b_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block4b_activation (Activa  (None, None, None, 384)      0         ['block4b_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block4b_se_squeeze (Global  (None, 384)                  0         ['block4b_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block4b_se_reshape (Reshap  (None, 1, 1, 384)            0         ['block4b_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block4b_se_reduce (Conv2D)  (None, 1, 1, 24)             9240      ['block4b_se_reshape[0][0]']  
                                                                                                  
 block4b_se_expand (Conv2D)  (None, 1, 1, 384)            9600      ['block4b_se_reduce[0][0]']   
                                                                                                  
 block4b_se_excite (Multipl  (None, None, None, 384)      0         ['block4b_activation[0][0]',  
 y)                                                                  'block4b_se_expand[0][0]']   
                                                                                                  
 block4b_project_conv (Conv  (None, None, None, 96)       36864     ['block4b_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block4b_project_bn (BatchN  (None, None, None, 96)       384       ['block4b_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block4b_drop (Dropout)      (None, None, None, 96)       0         ['block4b_project_bn[0][0]']  
                                                                                                  
 block4b_add (Add)           (None, None, None, 96)       0         ['block4b_drop[0][0]',        
                                                                     'block4a_project_bn[0][0]']  
                                                                                                  
 block4c_expand_conv (Conv2  (None, None, None, 384)      36864     ['block4b_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block4c_expand_bn (BatchNo  (None, None, None, 384)      1536      ['block4c_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block4c_expand_activation   (None, None, None, 384)      0         ['block4c_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block4c_dwconv2 (Depthwise  (None, None, None, 384)      3456      ['block4c_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block4c_bn (BatchNormaliza  (None, None, None, 384)      1536      ['block4c_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block4c_activation (Activa  (None, None, None, 384)      0         ['block4c_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block4c_se_squeeze (Global  (None, 384)                  0         ['block4c_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block4c_se_reshape (Reshap  (None, 1, 1, 384)            0         ['block4c_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block4c_se_reduce (Conv2D)  (None, 1, 1, 24)             9240      ['block4c_se_reshape[0][0]']  
                                                                                                  
 block4c_se_expand (Conv2D)  (None, 1, 1, 384)            9600      ['block4c_se_reduce[0][0]']   
                                                                                                  
 block4c_se_excite (Multipl  (None, None, None, 384)      0         ['block4c_activation[0][0]',  
 y)                                                                  'block4c_se_expand[0][0]']   
                                                                                                  
 block4c_project_conv (Conv  (None, None, None, 96)       36864     ['block4c_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block4c_project_bn (BatchN  (None, None, None, 96)       384       ['block4c_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block4c_drop (Dropout)      (None, None, None, 96)       0         ['block4c_project_bn[0][0]']  
                                                                                                  
 block4c_add (Add)           (None, None, None, 96)       0         ['block4c_drop[0][0]',        
                                                                     'block4b_add[0][0]']         
                                                                                                  
 block5a_expand_conv (Conv2  (None, None, None, 576)      55296     ['block4c_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block5a_expand_bn (BatchNo  (None, None, None, 576)      2304      ['block5a_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block5a_expand_activation   (None, None, None, 576)      0         ['block5a_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block5a_dwconv2 (Depthwise  (None, None, None, 576)      5184      ['block5a_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block5a_bn (BatchNormaliza  (None, None, None, 576)      2304      ['block5a_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block5a_activation (Activa  (None, None, None, 576)      0         ['block5a_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block5a_se_squeeze (Global  (None, 576)                  0         ['block5a_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block5a_se_reshape (Reshap  (None, 1, 1, 576)            0         ['block5a_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block5a_se_reduce (Conv2D)  (None, 1, 1, 24)             13848     ['block5a_se_reshape[0][0]']  
                                                                                                  
 block5a_se_expand (Conv2D)  (None, 1, 1, 576)            14400     ['block5a_se_reduce[0][0]']   
                                                                                                  
 block5a_se_excite (Multipl  (None, None, None, 576)      0         ['block5a_activation[0][0]',  
 y)                                                                  'block5a_se_expand[0][0]']   
                                                                                                  
 block5a_project_conv (Conv  (None, None, None, 112)      64512     ['block5a_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block5a_project_bn (BatchN  (None, None, None, 112)      448       ['block5a_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block5b_expand_conv (Conv2  (None, None, None, 672)      75264     ['block5a_project_bn[0][0]']  
 D)                                                                                               
                                                                                                  
 block5b_expand_bn (BatchNo  (None, None, None, 672)      2688      ['block5b_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block5b_expand_activation   (None, None, None, 672)      0         ['block5b_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block5b_dwconv2 (Depthwise  (None, None, None, 672)      6048      ['block5b_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block5b_bn (BatchNormaliza  (None, None, None, 672)      2688      ['block5b_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block5b_activation (Activa  (None, None, None, 672)      0         ['block5b_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block5b_se_squeeze (Global  (None, 672)                  0         ['block5b_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block5b_se_reshape (Reshap  (None, 1, 1, 672)            0         ['block5b_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block5b_se_reduce (Conv2D)  (None, 1, 1, 28)             18844     ['block5b_se_reshape[0][0]']  
                                                                                                  
 block5b_se_expand (Conv2D)  (None, 1, 1, 672)            19488     ['block5b_se_reduce[0][0]']   
                                                                                                  
 block5b_se_excite (Multipl  (None, None, None, 672)      0         ['block5b_activation[0][0]',  
 y)                                                                  'block5b_se_expand[0][0]']   
                                                                                                  
 block5b_project_conv (Conv  (None, None, None, 112)      75264     ['block5b_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block5b_project_bn (BatchN  (None, None, None, 112)      448       ['block5b_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block5b_drop (Dropout)      (None, None, None, 112)      0         ['block5b_project_bn[0][0]']  
                                                                                                  
 block5b_add (Add)           (None, None, None, 112)      0         ['block5b_drop[0][0]',        
                                                                     'block5a_project_bn[0][0]']  
                                                                                                  
 block5c_expand_conv (Conv2  (None, None, None, 672)      75264     ['block5b_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block5c_expand_bn (BatchNo  (None, None, None, 672)      2688      ['block5c_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block5c_expand_activation   (None, None, None, 672)      0         ['block5c_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block5c_dwconv2 (Depthwise  (None, None, None, 672)      6048      ['block5c_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block5c_bn (BatchNormaliza  (None, None, None, 672)      2688      ['block5c_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block5c_activation (Activa  (None, None, None, 672)      0         ['block5c_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block5c_se_squeeze (Global  (None, 672)                  0         ['block5c_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block5c_se_reshape (Reshap  (None, 1, 1, 672)            0         ['block5c_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block5c_se_reduce (Conv2D)  (None, 1, 1, 28)             18844     ['block5c_se_reshape[0][0]']  
                                                                                                  
 block5c_se_expand (Conv2D)  (None, 1, 1, 672)            19488     ['block5c_se_reduce[0][0]']   
                                                                                                  
 block5c_se_excite (Multipl  (None, None, None, 672)      0         ['block5c_activation[0][0]',  
 y)                                                                  'block5c_se_expand[0][0]']   
                                                                                                  
 block5c_project_conv (Conv  (None, None, None, 112)      75264     ['block5c_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block5c_project_bn (BatchN  (None, None, None, 112)      448       ['block5c_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block5c_drop (Dropout)      (None, None, None, 112)      0         ['block5c_project_bn[0][0]']  
                                                                                                  
 block5c_add (Add)           (None, None, None, 112)      0         ['block5c_drop[0][0]',        
                                                                     'block5b_add[0][0]']         
                                                                                                  
 block5d_expand_conv (Conv2  (None, None, None, 672)      75264     ['block5c_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block5d_expand_bn (BatchNo  (None, None, None, 672)      2688      ['block5d_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block5d_expand_activation   (None, None, None, 672)      0         ['block5d_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block5d_dwconv2 (Depthwise  (None, None, None, 672)      6048      ['block5d_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block5d_bn (BatchNormaliza  (None, None, None, 672)      2688      ['block5d_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block5d_activation (Activa  (None, None, None, 672)      0         ['block5d_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block5d_se_squeeze (Global  (None, 672)                  0         ['block5d_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block5d_se_reshape (Reshap  (None, 1, 1, 672)            0         ['block5d_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block5d_se_reduce (Conv2D)  (None, 1, 1, 28)             18844     ['block5d_se_reshape[0][0]']  
                                                                                                  
 block5d_se_expand (Conv2D)  (None, 1, 1, 672)            19488     ['block5d_se_reduce[0][0]']   
                                                                                                  
 block5d_se_excite (Multipl  (None, None, None, 672)      0         ['block5d_activation[0][0]',  
 y)                                                                  'block5d_se_expand[0][0]']   
                                                                                                  
 block5d_project_conv (Conv  (None, None, None, 112)      75264     ['block5d_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block5d_project_bn (BatchN  (None, None, None, 112)      448       ['block5d_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block5d_drop (Dropout)      (None, None, None, 112)      0         ['block5d_project_bn[0][0]']  
                                                                                                  
 block5d_add (Add)           (None, None, None, 112)      0         ['block5d_drop[0][0]',        
                                                                     'block5c_add[0][0]']         
                                                                                                  
 block5e_expand_conv (Conv2  (None, None, None, 672)      75264     ['block5d_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block5e_expand_bn (BatchNo  (None, None, None, 672)      2688      ['block5e_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block5e_expand_activation   (None, None, None, 672)      0         ['block5e_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block5e_dwconv2 (Depthwise  (None, None, None, 672)      6048      ['block5e_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block5e_bn (BatchNormaliza  (None, None, None, 672)      2688      ['block5e_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block5e_activation (Activa  (None, None, None, 672)      0         ['block5e_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block5e_se_squeeze (Global  (None, 672)                  0         ['block5e_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block5e_se_reshape (Reshap  (None, 1, 1, 672)            0         ['block5e_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block5e_se_reduce (Conv2D)  (None, 1, 1, 28)             18844     ['block5e_se_reshape[0][0]']  
                                                                                                  
 block5e_se_expand (Conv2D)  (None, 1, 1, 672)            19488     ['block5e_se_reduce[0][0]']   
                                                                                                  
 block5e_se_excite (Multipl  (None, None, None, 672)      0         ['block5e_activation[0][0]',  
 y)                                                                  'block5e_se_expand[0][0]']   
                                                                                                  
 block5e_project_conv (Conv  (None, None, None, 112)      75264     ['block5e_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block5e_project_bn (BatchN  (None, None, None, 112)      448       ['block5e_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block5e_drop (Dropout)      (None, None, None, 112)      0         ['block5e_project_bn[0][0]']  
                                                                                                  
 block5e_add (Add)           (None, None, None, 112)      0         ['block5e_drop[0][0]',        
                                                                     'block5d_add[0][0]']         
                                                                                                  
 block6a_expand_conv (Conv2  (None, None, None, 672)      75264     ['block5e_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block6a_expand_bn (BatchNo  (None, None, None, 672)      2688      ['block6a_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block6a_expand_activation   (None, None, None, 672)      0         ['block6a_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block6a_dwconv2 (Depthwise  (None, None, None, 672)      6048      ['block6a_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block6a_bn (BatchNormaliza  (None, None, None, 672)      2688      ['block6a_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block6a_activation (Activa  (None, None, None, 672)      0         ['block6a_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block6a_se_squeeze (Global  (None, 672)                  0         ['block6a_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block6a_se_reshape (Reshap  (None, 1, 1, 672)            0         ['block6a_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block6a_se_reduce (Conv2D)  (None, 1, 1, 28)             18844     ['block6a_se_reshape[0][0]']  
                                                                                                  
 block6a_se_expand (Conv2D)  (None, 1, 1, 672)            19488     ['block6a_se_reduce[0][0]']   
                                                                                                  
 block6a_se_excite (Multipl  (None, None, None, 672)      0         ['block6a_activation[0][0]',  
 y)                                                                  'block6a_se_expand[0][0]']   
                                                                                                  
 block6a_project_conv (Conv  (None, None, None, 192)      129024    ['block6a_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block6a_project_bn (BatchN  (None, None, None, 192)      768       ['block6a_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block6b_expand_conv (Conv2  (None, None, None, 1152)     221184    ['block6a_project_bn[0][0]']  
 D)                                                                                               
                                                                                                  
 block6b_expand_bn (BatchNo  (None, None, None, 1152)     4608      ['block6b_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block6b_expand_activation   (None, None, None, 1152)     0         ['block6b_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block6b_dwconv2 (Depthwise  (None, None, None, 1152)     10368     ['block6b_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block6b_bn (BatchNormaliza  (None, None, None, 1152)     4608      ['block6b_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block6b_activation (Activa  (None, None, None, 1152)     0         ['block6b_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block6b_se_squeeze (Global  (None, 1152)                 0         ['block6b_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block6b_se_reshape (Reshap  (None, 1, 1, 1152)           0         ['block6b_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block6b_se_reduce (Conv2D)  (None, 1, 1, 48)             55344     ['block6b_se_reshape[0][0]']  
                                                                                                  
 block6b_se_expand (Conv2D)  (None, 1, 1, 1152)           56448     ['block6b_se_reduce[0][0]']   
                                                                                                  
 block6b_se_excite (Multipl  (None, None, None, 1152)     0         ['block6b_activation[0][0]',  
 y)                                                                  'block6b_se_expand[0][0]']   
                                                                                                  
 block6b_project_conv (Conv  (None, None, None, 192)      221184    ['block6b_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block6b_project_bn (BatchN  (None, None, None, 192)      768       ['block6b_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block6b_drop (Dropout)      (None, None, None, 192)      0         ['block6b_project_bn[0][0]']  
                                                                                                  
 block6b_add (Add)           (None, None, None, 192)      0         ['block6b_drop[0][0]',        
                                                                     'block6a_project_bn[0][0]']  
                                                                                                  
 block6c_expand_conv (Conv2  (None, None, None, 1152)     221184    ['block6b_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block6c_expand_bn (BatchNo  (None, None, None, 1152)     4608      ['block6c_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block6c_expand_activation   (None, None, None, 1152)     0         ['block6c_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block6c_dwconv2 (Depthwise  (None, None, None, 1152)     10368     ['block6c_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block6c_bn (BatchNormaliza  (None, None, None, 1152)     4608      ['block6c_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block6c_activation (Activa  (None, None, None, 1152)     0         ['block6c_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block6c_se_squeeze (Global  (None, 1152)                 0         ['block6c_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block6c_se_reshape (Reshap  (None, 1, 1, 1152)           0         ['block6c_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block6c_se_reduce (Conv2D)  (None, 1, 1, 48)             55344     ['block6c_se_reshape[0][0]']  
                                                                                                  
 block6c_se_expand (Conv2D)  (None, 1, 1, 1152)           56448     ['block6c_se_reduce[0][0]']   
                                                                                                  
 block6c_se_excite (Multipl  (None, None, None, 1152)     0         ['block6c_activation[0][0]',  
 y)                                                                  'block6c_se_expand[0][0]']   
                                                                                                  
 block6c_project_conv (Conv  (None, None, None, 192)      221184    ['block6c_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block6c_project_bn (BatchN  (None, None, None, 192)      768       ['block6c_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block6c_drop (Dropout)      (None, None, None, 192)      0         ['block6c_project_bn[0][0]']  
                                                                                                  
 block6c_add (Add)           (None, None, None, 192)      0         ['block6c_drop[0][0]',        
                                                                     'block6b_add[0][0]']         
                                                                                                  
 block6d_expand_conv (Conv2  (None, None, None, 1152)     221184    ['block6c_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block6d_expand_bn (BatchNo  (None, None, None, 1152)     4608      ['block6d_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block6d_expand_activation   (None, None, None, 1152)     0         ['block6d_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block6d_dwconv2 (Depthwise  (None, None, None, 1152)     10368     ['block6d_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block6d_bn (BatchNormaliza  (None, None, None, 1152)     4608      ['block6d_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block6d_activation (Activa  (None, None, None, 1152)     0         ['block6d_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block6d_se_squeeze (Global  (None, 1152)                 0         ['block6d_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block6d_se_reshape (Reshap  (None, 1, 1, 1152)           0         ['block6d_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block6d_se_reduce (Conv2D)  (None, 1, 1, 48)             55344     ['block6d_se_reshape[0][0]']  
                                                                                                  
 block6d_se_expand (Conv2D)  (None, 1, 1, 1152)           56448     ['block6d_se_reduce[0][0]']   
                                                                                                  
 block6d_se_excite (Multipl  (None, None, None, 1152)     0         ['block6d_activation[0][0]',  
 y)                                                                  'block6d_se_expand[0][0]']   
                                                                                                  
 block6d_project_conv (Conv  (None, None, None, 192)      221184    ['block6d_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block6d_project_bn (BatchN  (None, None, None, 192)      768       ['block6d_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block6d_drop (Dropout)      (None, None, None, 192)      0         ['block6d_project_bn[0][0]']  
                                                                                                  
 block6d_add (Add)           (None, None, None, 192)      0         ['block6d_drop[0][0]',        
                                                                     'block6c_add[0][0]']         
                                                                                                  
 block6e_expand_conv (Conv2  (None, None, None, 1152)     221184    ['block6d_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block6e_expand_bn (BatchNo  (None, None, None, 1152)     4608      ['block6e_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block6e_expand_activation   (None, None, None, 1152)     0         ['block6e_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block6e_dwconv2 (Depthwise  (None, None, None, 1152)     10368     ['block6e_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block6e_bn (BatchNormaliza  (None, None, None, 1152)     4608      ['block6e_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block6e_activation (Activa  (None, None, None, 1152)     0         ['block6e_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block6e_se_squeeze (Global  (None, 1152)                 0         ['block6e_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block6e_se_reshape (Reshap  (None, 1, 1, 1152)           0         ['block6e_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block6e_se_reduce (Conv2D)  (None, 1, 1, 48)             55344     ['block6e_se_reshape[0][0]']  
                                                                                                  
 block6e_se_expand (Conv2D)  (None, 1, 1, 1152)           56448     ['block6e_se_reduce[0][0]']   
                                                                                                  
 block6e_se_excite (Multipl  (None, None, None, 1152)     0         ['block6e_activation[0][0]',  
 y)                                                                  'block6e_se_expand[0][0]']   
                                                                                                  
 block6e_project_conv (Conv  (None, None, None, 192)      221184    ['block6e_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block6e_project_bn (BatchN  (None, None, None, 192)      768       ['block6e_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block6e_drop (Dropout)      (None, None, None, 192)      0         ['block6e_project_bn[0][0]']  
                                                                                                  
 block6e_add (Add)           (None, None, None, 192)      0         ['block6e_drop[0][0]',        
                                                                     'block6d_add[0][0]']         
                                                                                                  
 block6f_expand_conv (Conv2  (None, None, None, 1152)     221184    ['block6e_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block6f_expand_bn (BatchNo  (None, None, None, 1152)     4608      ['block6f_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block6f_expand_activation   (None, None, None, 1152)     0         ['block6f_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block6f_dwconv2 (Depthwise  (None, None, None, 1152)     10368     ['block6f_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block6f_bn (BatchNormaliza  (None, None, None, 1152)     4608      ['block6f_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block6f_activation (Activa  (None, None, None, 1152)     0         ['block6f_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block6f_se_squeeze (Global  (None, 1152)                 0         ['block6f_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block6f_se_reshape (Reshap  (None, 1, 1, 1152)           0         ['block6f_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block6f_se_reduce (Conv2D)  (None, 1, 1, 48)             55344     ['block6f_se_reshape[0][0]']  
                                                                                                  
 block6f_se_expand (Conv2D)  (None, 1, 1, 1152)           56448     ['block6f_se_reduce[0][0]']   
                                                                                                  
 block6f_se_excite (Multipl  (None, None, None, 1152)     0         ['block6f_activation[0][0]',  
 y)                                                                  'block6f_se_expand[0][0]']   
                                                                                                  
 block6f_project_conv (Conv  (None, None, None, 192)      221184    ['block6f_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block6f_project_bn (BatchN  (None, None, None, 192)      768       ['block6f_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block6f_drop (Dropout)      (None, None, None, 192)      0         ['block6f_project_bn[0][0]']  
                                                                                                  
 block6f_add (Add)           (None, None, None, 192)      0         ['block6f_drop[0][0]',        
                                                                     'block6e_add[0][0]']         
                                                                                                  
 block6g_expand_conv (Conv2  (None, None, None, 1152)     221184    ['block6f_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block6g_expand_bn (BatchNo  (None, None, None, 1152)     4608      ['block6g_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block6g_expand_activation   (None, None, None, 1152)     0         ['block6g_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block6g_dwconv2 (Depthwise  (None, None, None, 1152)     10368     ['block6g_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block6g_bn (BatchNormaliza  (None, None, None, 1152)     4608      ['block6g_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block6g_activation (Activa  (None, None, None, 1152)     0         ['block6g_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block6g_se_squeeze (Global  (None, 1152)                 0         ['block6g_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block6g_se_reshape (Reshap  (None, 1, 1, 1152)           0         ['block6g_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block6g_se_reduce (Conv2D)  (None, 1, 1, 48)             55344     ['block6g_se_reshape[0][0]']  
                                                                                                  
 block6g_se_expand (Conv2D)  (None, 1, 1, 1152)           56448     ['block6g_se_reduce[0][0]']   
                                                                                                  
 block6g_se_excite (Multipl  (None, None, None, 1152)     0         ['block6g_activation[0][0]',  
 y)                                                                  'block6g_se_expand[0][0]']   
                                                                                                  
 block6g_project_conv (Conv  (None, None, None, 192)      221184    ['block6g_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block6g_project_bn (BatchN  (None, None, None, 192)      768       ['block6g_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block6g_drop (Dropout)      (None, None, None, 192)      0         ['block6g_project_bn[0][0]']  
                                                                                                  
 block6g_add (Add)           (None, None, None, 192)      0         ['block6g_drop[0][0]',        
                                                                     'block6f_add[0][0]']         
                                                                                                  
 block6h_expand_conv (Conv2  (None, None, None, 1152)     221184    ['block6g_add[0][0]']         
 D)                                                                                               
                                                                                                  
 block6h_expand_bn (BatchNo  (None, None, None, 1152)     4608      ['block6h_expand_conv[0][0]'] 
 rmalization)                                                                                     
                                                                                                  
 block6h_expand_activation   (None, None, None, 1152)     0         ['block6h_expand_bn[0][0]']   
 (Activation)                                                                                     
                                                                                                  
 block6h_dwconv2 (Depthwise  (None, None, None, 1152)     10368     ['block6h_expand_activation[0]
 Conv2D)                                                            [0]']                         
                                                                                                  
 block6h_bn (BatchNormaliza  (None, None, None, 1152)     4608      ['block6h_dwconv2[0][0]']     
 tion)                                                                                            
                                                                                                  
 block6h_activation (Activa  (None, None, None, 1152)     0         ['block6h_bn[0][0]']          
 tion)                                                                                            
                                                                                                  
 block6h_se_squeeze (Global  (None, 1152)                 0         ['block6h_activation[0][0]']  
 AveragePooling2D)                                                                                
                                                                                                  
 block6h_se_reshape (Reshap  (None, 1, 1, 1152)           0         ['block6h_se_squeeze[0][0]']  
 e)                                                                                               
                                                                                                  
 block6h_se_reduce (Conv2D)  (None, 1, 1, 48)             55344     ['block6h_se_reshape[0][0]']  
                                                                                                  
 block6h_se_expand (Conv2D)  (None, 1, 1, 1152)           56448     ['block6h_se_reduce[0][0]']   
                                                                                                  
 block6h_se_excite (Multipl  (None, None, None, 1152)     0         ['block6h_activation[0][0]',  
 y)                                                                  'block6h_se_expand[0][0]']   
                                                                                                  
 block6h_project_conv (Conv  (None, None, None, 192)      221184    ['block6h_se_excite[0][0]']   
 2D)                                                                                              
                                                                                                  
 block6h_project_bn (BatchN  (None, None, None, 192)      768       ['block6h_project_conv[0][0]']
 ormalization)                                                                                    
                                                                                                  
 block6h_drop (Dropout)      (None, None, None, 192)      0         ['block6h_project_bn[0][0]']  
                                                                                                  
 block6h_add (Add)           (None, None, None, 192)      0         ['block6h_drop[0][0]',        
                                                                     'block6g_add[0][0]']         
                                                                                                  
 top_conv (Conv2D)           (None, None, None, 1280)     245760    ['block6h_add[0][0]']         
                                                                                                  
 top_bn (BatchNormalization  (None, None, None, 1280)     5120      ['top_conv[0][0]']            
 )                                                                                                
                                                                                                  
 top_activation (Activation  (None, None, None, 1280)     0         ['top_bn[0][0]']              
 )                                                                                                
                                                                                                  
==================================================================================================
Total params: 5919312 (22.58 MB)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 5919312 (22.58 MB)
__________________________________________________________________________________________________

In [13]:

m0.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0         
                                                                 
 efficientnetv2-b0 (Functio  (None, None, None, 1280   5919312   
 nal)                        )                                   
                                                                 
 global_average_pooling_lay  (None, 1280)              0         
 er (GlobalAveragePooling2D                                      
 )                                                               
                                                                 
 output_layer (Dense)        (None, 10)                12810     
                                                                 
=================================================================
Total params: 5932122 (22.63 MB)
Trainable params: 12810 (50.04 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________

Visualize Loss & Accuracy Curves

In [14]:

plot_loss_curves(m0History)

Model II: train-on-1%-with-aug

Split Data: Test & Train

In [15]:

train_data_1p = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir_path_1p,
                                                                            image_size=IMG_OUTPUT_SIZE,
                                                                            label_mode=labelMode,
                                                                            batch_size=batchSize)
test_data_1p = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir_path_1p,
                                                                           image_size=IMG_OUTPUT_SIZE,
                                                                           label_mode=labelMode)

Found 70 files belonging to 10 classes.
Found 2500 files belonging to 10 classes.

Augment The Training Data

Using a Sequential keras model, a composed augmentation layer will be made. The augmentation layer will be made of several "inner" layers. The "inner" layers are the layers that will "augment" the data, as described by each layer:

In [16]:

augmentationLayer = keras.Sequential([
  layers.RandomFlip("horizontal"),
  layers.RandomRotation(0.2),
  layers.RandomZoom(0.2),
  layers.RandomHeight(0.2),
  layers.RandomWidth(0.2),
  # preprocessing.Rescaling(1./255) # keep for ResNet50V2, remove for EfficientNetV2B0
], name ="data_augmentation")


# # NOTE: 
# Previous versions of TensorFlow (e.g. 2.4 and below) used 
# tensorflow.keras.layers.experimental.processing:

# augmentationLayer = keras.Sequential([
#   preprocessing.RandomFlip("horizontal"),
#   preprocessing.RandomRotation(0.2),
#   preprocessing.RandomZoom(0.2),
#   preprocessing.RandomHeight(0.2),
#   preprocessing.RandomWidth(0.2),
#   # preprocessing.Rescaling(1./255) # keep for ResNet50V2, remove for EfficientNetV2B0
# ], name ="data_augmentation")

Visualize Some Original & Augmented Images

In [17]:

target_class = random.choice(train_data_1p.class_names) # choose a random class
target_dir = "10_food_classes_1_percent/train/" + target_class # create the target directory
random_image = random.choice(os.listdir(target_dir)) # choose a random image from target directory
random_image_path = target_dir + "/" + random_image # create the choosen random image path
img = mpimg.imread(random_image_path) # read in the chosen target image

plt.imshow(img) # plot the target image
plt.title(f"Original random image from class: {target_class}")
plt.axis(False); # turn off the axes

# Reshape, Augment, and ReNormalize
imgWithNewShape = tf.expand_dims(img, axis=0)
augmented_img = augmentationLayer(imgWithNewShape) # data augmentation model requires shape (None, height, width, 3)
normalizedAugmentedImg = tf.squeeze(augmented_img)/255. # requires normalization after augmentation

plt.figure()
plt.imshow(normalizedAugmentedImg)
plt.title(f"Augmented random image from class: {target_class}")
plt.axis(False);

Build, Compile, Fit

In [18]:

# RE-Using inputLayer var from above

# Add in data augmentation Sequential model as a layer
applied1pAugModel = augmentationLayer(inputLayer)

# Give base_model inputLayer (after augmentation) and don't train it
applied1pAugModel = base_model(applied1pAugModel, training=False)

# Pool output features of base model
applied1pAugModel = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(applied1pAugModel)

# Put a dense layer on as the output
applied1pAugOutput = layers.Dense(10, activation="softmax", name="output_layer")(applied1pAugModel)

# Make a model with "inputs" and "outputs"
m1 = keras.Model(inputLayer, applied1pAugOutput)

# Compile the model
m1.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

# Fit the model
history_1p = m1.fit(train_data_1p,
                    epochs=5,
                    steps_per_epoch=len(train_data_1p),
                    validation_data=test_data_1p,
                    validation_steps=int(0.25* len(test_data_1p)), # validate for less steps
                    # Track model training logs
                    callbacks=[create_tensorboard_callback("transfer_learning", "1p_data_aug")])

Saving TensorBoard log files to: transfer_learning/1p_data_aug/20240701-113128
Epoch 1/5
5/5 [==============================] - 35s 6s/step - loss: 2.3026 - accuracy: 0.0857 - val_loss: 2.2014 - val_accuracy: 0.1546
Epoch 2/5
5/5 [==============================] - 22s 5s/step - loss: 2.0351 - accuracy: 0.3000 - val_loss: 2.0395 - val_accuracy: 0.3109
Epoch 3/5
5/5 [==============================] - 15s 4s/step - loss: 1.8000 - accuracy: 0.4714 - val_loss: 1.8920 - val_accuracy: 0.4276
Epoch 4/5
5/5 [==============================] - 14s 3s/step - loss: 1.5750 - accuracy: 0.7429 - val_loss: 1.7550 - val_accuracy: 0.5263
Epoch 5/5
5/5 [==============================] - 14s 3s/step - loss: 1.4182 - accuracy: 0.7714 - val_loss: 1.6690 - val_accuracy: 0.5625

Model Comparison

ACCURACY:
- m0 has val_accuracy of ~88%
- m1, with augmented data, has val_accuracy of ~48%
- m0 has significantly higher accuracy
LOSS CURVE
- m0 has a "nicer" loss curve epoch-to-epohc

Inspect Model

Summary

In [19]:

m1.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0         
                                                                 
 data_augmentation (Sequent  (None, None, None, 3)     0         
 ial)                                                            
                                                                 
 efficientnetv2-b0 (Functio  (None, None, None, 1280   5919312   
 nal)                        )                                   
                                                                 
 global_average_pooling_lay  (None, 1280)              0         
 er (GlobalAveragePooling2D                                      
 )                                                               
                                                                 
 output_layer (Dense)        (None, 10)                12810     
                                                                 
=================================================================
Total params: 5932122 (22.63 MB)
Trainable params: 12810 (50.04 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________

In [20]:

m1.evaluate(test_data_1p)

79/79 [==============================] - 51s 644ms/step - loss: 1.6378 - accuracy: 0.5620

Out [20]:

[1.6377942562103271, 0.5619999766349792]

Visualize Loss & Accuracy Curves

In [21]:

plot_loss_curves(history_1p)

Model III: train-on-10%-with-aug

This uses a few of the same variables set above used in m0, as that model also used 10% of the food101 dataset.
This also uses data augmentation.
This will also save the model, via a "checkpoint", with the help of the tensorflow method tf.keras.callbacks.ModelCheckpoint.

Build, Compile

In [22]:

appliedM210pAug = augmentationLayer(inputLayer) # augment our training images

# training=False: https://keras.io/guides/transfer_learning/#build-a-model
# pass augmented images to base model but keep it in inference mode, so batchnorm layers don't get updated

appliedM210pAug = base_model(appliedM210pAug, training=False)


appliedM210pAug = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(appliedM210pAug)
appliedM2Outputs = layers.Dense(10, activation="softmax", name="output_layer")(appliedM210pAug)
m2 = tf.keras.Model(inputLayer, appliedM2Outputs)

# Compile
m2.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), # use Adam optimizer with base learning rate
              metrics=["accuracy"])

Create New Callback: ModelCheckpoint

save the model OR model weights at a given frequency.
Saved checkpoints can be re-loaded "later" and used in later model development.

In [23]:

savedCheckpointPath = "m2_10p_checkpoints_weights/checkpoint.ckpt"

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=savedCheckpointPath,
                                                         save_weights_only=True, # set to False to save the entire model
                                                         save_best_only=True, # save only the best model weights instead of a model every epoch
                                                         save_freq="epoch", # save every epoch
                                                         verbose=1)

Fit

In [24]:

m2History = m2.fit(train_data_10p,
                                          epochs=5,
                                          validation_data=test_data_10p,
                                          validation_steps=int(0.25 * len(test_data_10p)), # do less steps per validation (quicker)
                                          callbacks=[create_tensorboard_callback("transfer_learning", "m2_10p_aug"),
                                                     checkpoint_callback])

Saving TensorBoard log files to: transfer_learning/m2_10p_aug/20240701-113403
Epoch 1/5
47/47 [==============================] - ETA: 0s - loss: 1.8277 - accuracy: 0.4267
Epoch 1: val_loss improved from inf to 1.20227, saving model to m2_10p_checkpoints_weights/checkpoint.ckpt
47/47 [==============================] - 38s 677ms/step - loss: 1.8277 - accuracy: 0.4267 - val_loss: 1.2023 - val_accuracy: 0.7286
Epoch 2/5
47/47 [==============================] - ETA: 0s - loss: 1.1691 - accuracy: 0.7040
Epoch 2: val_loss improved from 1.20227 to 0.85795, saving model to m2_10p_checkpoints_weights/checkpoint.ckpt
47/47 [==============================] - 31s 658ms/step - loss: 1.1691 - accuracy: 0.7040 - val_loss: 0.8579 - val_accuracy: 0.7845
Epoch 3/5
47/47 [==============================] - ETA: 0s - loss: 0.9075 - accuracy: 0.7667
Epoch 3: val_loss improved from 0.85795 to 0.69020, saving model to m2_10p_checkpoints_weights/checkpoint.ckpt
47/47 [==============================] - 31s 674ms/step - loss: 0.9075 - accuracy: 0.7667 - val_loss: 0.6902 - val_accuracy: 0.8125
Epoch 4/5
47/47 [==============================] - ETA: 0s - loss: 0.7970 - accuracy: 0.7880
Epoch 4: val_loss improved from 0.69020 to 0.62156, saving model to m2_10p_checkpoints_weights/checkpoint.ckpt
47/47 [==============================] - 45s 976ms/step - loss: 0.7970 - accuracy: 0.7880 - val_loss: 0.6216 - val_accuracy: 0.8388
Epoch 5/5
47/47 [==============================] - ETA: 0s - loss: 0.7022 - accuracy: 0.8160
Epoch 5: val_loss improved from 0.62156 to 0.57368, saving model to m2_10p_checkpoints_weights/checkpoint.ckpt
47/47 [==============================] - 32s 687ms/step - loss: 0.7022 - accuracy: 0.8160 - val_loss: 0.5737 - val_accuracy: 0.8339

Summarize, Visualize, Inspect

In [25]:

m2_10p_aug_evaluated = m2.evaluate(test_data_10p)
m2_10p_aug_evaluated

79/79 [==============================] - 51s 642ms/step - loss: 0.5672 - accuracy: 0.8316

Out [25]:

[0.5671973824501038, 0.83160001039505]

In [26]:

plot_loss_curves(m2History)

In [27]:

m2.summary()

Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_layer (InputLayer)    [(None, 224, 224, 3)]     0         
                                                                 
 data_augmentation (Sequent  (None, None, None, 3)     0         
 ial)                                                            
                                                                 
 efficientnetv2-b0 (Functio  (None, None, None, 1280   5919312   
 nal)                        )                                   
                                                                 
 global_average_pooling_lay  (None, 1280)              0         
 er (GlobalAveragePooling2D                                      
 )                                                               
                                                                 
 output_layer (Dense)        (None, 10)                12810     
                                                                 
=================================================================
Total params: 5932122 (22.63 MB)
Trainable params: 12810 (50.04 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________

Deep Inspection: Layers

In [28]:

m2.layers

Out [28]:

[<keras.src.engine.input_layer.InputLayer at 0xffff2455b850>,
 <keras.src.engine.sequential.Sequential at 0xfffef1f799d0>,
 <keras.src.engine.functional.Functional at 0xffff24401450>,
 <keras.src.layers.pooling.global_average_pooling2d.GlobalAveragePooling2D at 0xfffef2e2f210>,
 <keras.src.layers.core.dense.Dense at 0xfffef1d0f650>]

In [29]:

for layer_number, layer in enumerate(m2.layers):
  print(f"Layer NUMBER: {layer_number} \t| NAME: {layer.name} \t| TYPE: {layer} \t| Trainable? {layer.trainable}")

Layer NUMBER: 0 	| NAME: input_layer 	| TYPE: <keras.src.engine.input_layer.InputLayer object at 0xffff2455b850> 	| Trainable? True
Layer NUMBER: 1 	| NAME: data_augmentation 	| TYPE: <keras.src.engine.sequential.Sequential object at 0xfffef1f799d0> 	| Trainable? True
Layer NUMBER: 2 	| NAME: efficientnetv2-b0 	| TYPE: <keras.src.engine.functional.Functional object at 0xffff24401450> 	| Trainable? False
Layer NUMBER: 3 	| NAME: global_average_pooling_layer 	| TYPE: <keras.src.layers.pooling.global_average_pooling2d.GlobalAveragePooling2D object at 0xfffef2e2f210> 	| Trainable? True
Layer NUMBER: 4 	| NAME: output_layer 	| TYPE: <keras.src.layers.core.dense.Dense object at 0xfffef1d0f650> 	| Trainable? True

Load A Model From Checkpointed Weights

Loading a model from a saved checkpoint

Model m3 used the callback that saves weights to a checkpoint.
The saved checkpoint can be used...

reload the weights from the file
evaluate the model
compare SAVED weights to the NEW weights after using the model to evaluate on data

In [30]:

# 
# load weights into a model
# 
m2.load_weights(savedCheckpointPath)

Out [30]:

<tensorflow.python.checkpoint.checkpoint.CheckpointLoadStatus at 0xfffef0582790>

In [31]:

# 
# evaluate the model against test data AFTER usings the checkpointd weights
# 
m2WithLoadedWeights = m2.evaluate(test_data_10p)

79/79 [==============================] - 60s 762ms/step - loss: 0.5672 - accuracy: 0.8316

Compare Model With And Without Checkpointed Weights

In [32]:

# 
# COMPARE evaluate results:
# WITHOUT loaded weights
# WITH loaded weights
# 
m2WithLoadedWeights

Out [32]:

[0.5671975612640381, 0.83160001039505]

In [33]:

m2_10p_aug_evaluated

Out [33]:

[0.5671973824501038, 0.83160001039505]

In [34]:

# Check to see if loaded model results are very close to native model results (should output True)
np.isclose(np.array(m2WithLoadedWeights), np.array(m2_10p_aug_evaluated))

Out [34]:

array([ True,  True])

In [35]:

# Check the difference between the two results (small values)
print(np.array(m2WithLoadedWeights) - np.array(m2_10p_aug_evaluated))

[1.78813934e-07 0.00000000e+00]

Model IV: Fine-Tuning Model II

"UnFreezing" some layers in the pre-trained model.
A workflow for fine-tuning:

build a feature-extracted model
train the weights in the output layer
THEN un-freeze some layers and "work backwards" to unfreee more and more layers

How many layers should be "un-frozen" in a "base" pre-trained model?
There may not be a "consensus" on this topic.

A Paper on fine-tuning.

Inspecting Layers & Base Model

see the layers in m2
figure out which layers are currently trainable in m2

In [36]:

m2.layers

Out [36]:

[<keras.src.engine.input_layer.InputLayer at 0xffff2455b850>,
 <keras.src.engine.sequential.Sequential at 0xfffef1f799d0>,
 <keras.src.engine.functional.Functional at 0xffff24401450>,
 <keras.src.layers.pooling.global_average_pooling2d.GlobalAveragePooling2D at 0xfffef2e2f210>,
 <keras.src.layers.core.dense.Dense at 0xfffef1d0f650>]

In [37]:

for layer_number, layer in enumerate(m2.layers):
  print(f"Layer #{layer_number}\n\tNAME: {layer.name}\n\t TYPE: {layer}\n\tTrainable? {layer.trainable}")

Layer #0
	NAME: input_layer
	 TYPE: <keras.src.engine.input_layer.InputLayer object at 0xffff2455b850>
	Trainable? True
Layer #1
	NAME: data_augmentation
	 TYPE: <keras.src.engine.sequential.Sequential object at 0xfffef1f799d0>
	Trainable? True
Layer #2
	NAME: efficientnetv2-b0
	 TYPE: <keras.src.engine.functional.Functional object at 0xffff24401450>
	Trainable? False
Layer #3
	NAME: global_average_pooling_layer
	 TYPE: <keras.src.layers.pooling.global_average_pooling2d.GlobalAveragePooling2D object at 0xfffef2e2f210>
	Trainable? True
Layer #4
	NAME: output_layer
	 TYPE: <keras.src.layers.core.dense.Dense object at 0xfffef1d0f650>
	Trainable? True

the efficientnetv2-b0 model is the model we are most interested in "unf-freezing"
the efficientnetv2-b0 model is, itself, a layer in our model
the efficientnetv2-b0 model is layer #2

In [57]:

m2BaseModel = m2.layers[2]
print(f'm2BaseModel: {m2BaseModel.name}')

m2BaseModel: efficientnetv2-b0

In [39]:

# to see ALL OF THE LAYERS IN THAT LAYER...

# for i, lyr in enumerate(m3.layers[2].layers):
#     print(i,lyr.name,lyr.trainable)


# how many TRAINABLE layers in that layer:
print(len(m2BaseModel.trainable_variables))

UnFreeze Some (10) Base-Model Layers

This is the beginning of fine-tuning.
Un-Freeze, Retrain, inspect, rinse & repeat.

In [40]:

# make trainable!
m2BaseModel.trainable = True

In [41]:

# Re-Freeze all layers EXCEPT FOR the last 10
for layer in m2BaseModel.layers[:-10]:
  layer.trainable = False

In [56]:

# Check which layers are NOW tuneable/trainable
for layer_number, layer in enumerate(m2BaseModel.layers):
    if(layer.trainable == True):
        print(f'layer #{layer_number}, {layer.name}, is trainable')

layer #260, block6h_se_reduce, is trainable
layer #261, block6h_se_expand, is trainable
layer #262, block6h_se_excite, is trainable
layer #263, block6h_project_conv, is trainable
layer #264, block6h_project_bn, is trainable
layer #265, block6h_drop, is trainable
layer #266, block6h_add, is trainable
layer #267, top_conv, is trainable
layer #268, top_bn, is trainable
layer #269, top_activation, is trainable

Re-Compile The Model

After making a change to the model, the model needs re-compiling.
Layers trainability have been edited, so the impact on training the model may (likely) will be different.

NOTE: in this compilation, the learning-rate will be set to a smaller/more fine-tined value in order to leverage fine-tuning better. Seems like 10x smaller learning-rate is a generally accepted place to start.

In [43]:

# Recompile the whole model (always recompile after any adjustments to a model)
m2.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), # lr is 10x lower than before for fine-tuning
                metrics=["accuracy"])

In [44]:

print(f'm2 now has {len(m2.trainable_variables)} trainable vars')

m2 now has 12 trainable vars

Fit The Model

In [58]:

initialEpochCount = 5
fineTuneEpochCount = initialEpochCount + 5

# Refit the model (same as model_2 except with more trainable layers)
m2FineTuneHistory = m2.fit(train_data_10p,
                                               epochs=fineTuneEpochCount,
                                               validation_data=test_data_10p,
                                               initial_epoch=m2History.epoch[-1], # START from last epoch of previous "m3.fit"
                                               validation_steps=int(0.25 * len(test_data_10p)),
                                               callbacks=[create_tensorboard_callback("transfer_learning", "10p_data_aug_fine_tuned")]) # name experiment appropriately

Saving TensorBoard log files to: transfer_learning/10p_data_aug_fine_tuned/20240701-125404
Epoch 5/10
47/47 [==============================] - 62s 1s/step - loss: 0.3241 - accuracy: 0.8960 - val_loss: 0.4556 - val_accuracy: 0.8586
Epoch 6/10
47/47 [==============================] - 75s 2s/step - loss: 0.3232 - accuracy: 0.9040 - val_loss: 0.4796 - val_accuracy: 0.8520
Epoch 7/10
47/47 [==============================] - 66s 1s/step - loss: 0.2820 - accuracy: 0.9227 - val_loss: 0.4667 - val_accuracy: 0.8602
Epoch 8/10
47/47 [==============================] - 46s 986ms/step - loss: 0.2963 - accuracy: 0.9133 - val_loss: 0.4138 - val_accuracy: 0.8553
Epoch 9/10
47/47 [==============================] - 54s 1s/step - loss: 0.2408 - accuracy: 0.9173 - val_loss: 0.4304 - val_accuracy: 0.8684
Epoch 10/10
47/47 [==============================] - 84s 2s/step - loss: 0.2097 - accuracy: 0.9493 - val_loss: 0.4424 - val_accuracy: 0.8602

Evaluate

m3 model.
fine-tuned for 5 more epochs.
a bunch of "un-frozen" layers in the pre-trained model, re-trained on OUR data!

In [46]:

plot_loss_curves(m2FineTuneHistory)

Compare Fine-Tuned to Pre-Fine-Tuned Model

In [47]:

def compare_historys(original_history, new_history, initial_epochs=5):
    """
    Compares two model history objects.
    """
    # Get original history measurements
    acc = original_history.history["accuracy"]
    loss = original_history.history["loss"]

    print(len(acc))

    val_acc = original_history.history["val_accuracy"]
    val_loss = original_history.history["val_loss"]

    # Combine original history with new history
    total_acc = acc + new_history.history["accuracy"]
    total_loss = loss + new_history.history["loss"]

    total_val_acc = val_acc + new_history.history["val_accuracy"]
    total_val_loss = val_loss + new_history.history["val_loss"]

    print(len(total_acc))
    print(total_acc)

    # Make plots
    plt.figure(figsize=(8, 8))
    plt.subplot(2, 1, 1)
    plt.plot(total_acc, label='Training Accuracy')
    plt.plot(total_val_acc, label='Validation Accuracy')
    plt.plot([initial_epochs-1, initial_epochs-1],
              plt.ylim(), label='Start Fine Tuning') # reshift plot around epochs
    plt.legend(loc='lower right')
    plt.title('Training and Validation Accuracy')

    plt.subplot(2, 1, 2)
    plt.plot(total_loss, label='Training Loss')
    plt.plot(total_val_loss, label='Validation Loss')
    plt.plot([initial_epochs-1, initial_epochs-1],
              plt.ylim(), label='Start Fine Tuning') # reshift plot around epochs
    plt.legend(loc='upper right')
    plt.title('Training and Validation Loss')
    plt.xlabel('epoch')
    plt.show()

In [59]:

compare_historys(original_history=m2History,
                 new_history=m2FineTuneHistory,
                 initial_epochs=5)

5
11
[0.4266666769981384, 0.7039999961853027, 0.7666666507720947, 0.7879999876022339, 0.8159999847412109, 0.8960000276565552, 0.9039999842643738, 0.9226666688919067, 0.9133333563804626, 0.9173333048820496, 0.9493333101272583]

Model V: Fine-Tuning with 100% data

Model 3 is training on 10% of the data.
Model 4 (really an edited model 3) is fine-tuning a pre-trained model, using 10% of the data.
THIS model, model V, will start with the same "base" model that was used in model II, "train-on-10%-with-aug":

use load_weights to get a model setup to be similar to model III
"open up" the weights for trainability
fine-tune the model using 100% of the data (instead of 10% in the previous model, model IV)
compare model II to model V to see what impact fine-tuning & 100% of training data will have compared to training on 1% of the data with data augmentation

In [49]:

# data_dir_path_100p, train_dir_path_100p, test_dir_path_100p
walk_through_dir("10_food_classes_all_data")

There are 2 directories and 0 images in '10_food_classes_all_data'.
There are 10 directories and 0 images in '10_food_classes_all_data/test'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/ice_cream'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/chicken_curry'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/steak'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/sushi'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/chicken_wings'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/grilled_salmon'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/hamburger'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/pizza'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/ramen'.
There are 0 directories and 250 images in '10_food_classes_all_data/test/fried_rice'.
There are 10 directories and 0 images in '10_food_classes_all_data/train'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/ice_cream'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/chicken_curry'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/steak'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/sushi'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/chicken_wings'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/grilled_salmon'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/hamburger'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/pizza'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/ramen'.
There are 0 directories and 750 images in '10_food_classes_all_data/train/fried_rice'.

Split Data

In [50]:

IMG_SIZE = (224, 224)
train_data_100p = tf.keras.preprocessing.image_dataset_from_directory(train_dir_path_100p,
                                                                                 label_mode="categorical",
                                                                                 image_size=IMG_SIZE)

# Note: this is the same test dataset we've been using for the previous modelling experiments
test_data_100p = tf.keras.preprocessing.image_dataset_from_directory(test_dir_path_100p,
                                                                label_mode="categorical",
                                                                image_size=IMG_SIZE)

Found 7500 files belonging to 10 classes.
Found 2500 files belonging to 10 classes.

Evaluate m2 with all the test data

In [60]:

m2.evaluate(test_data_100p)

79/79 [==============================] - 85s 1s/step - loss: 0.4468 - accuracy: 0.8584

Out [60]:

[0.446774959564209, 0.8583999872207642]

Build & Compile Model

This is the same exact model config as the previous model.
(this could be converted to a function for further re-usability)

In [52]:

# # Create base model
# m = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)
# m4.trainable = False

# # Setup model input and outputs with data augmentation
# m4InputLayer = layers.Input(shape=(224, 224, 3), name="input_layer")
# appliedM4 = augmentationLayer(m4InputLayer)
# appliedM4 = m4(appliedM4, training=False)  # pass augmented images to base model but keep it in inference mode
# appliedM4 = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(appliedM4)
# m4OutputLayer = layers.Dense(units=10, activation="softmax", name="output_layer")(appliedM4)
# m3 = tf.keras.Model(m4InputLayer, m4OutputLayer)

# # Compile
# m4.compile(loss="categorical_crossentropy",
#               optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
#               metrics=["accuracy"])

Load Weights into model

"revert" m2 "back" to its "state" where it was trained on 10% of the data, with augmentation, which is marked as Model II in this doc.

With m2 reverted to that state, this next model, "Model V", can be more clearly compared to Model II.

In [61]:

m2.load_weights(savedCheckpointPath)

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)Cell In[61], line 1
----> 1 m2.load_weights(savedCheckpointPath)
File /opt/conda/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb
File /opt/conda/lib/python3.11/site-packages/tensorflow/python/ops/resource_variable_ops.py:785, in BaseResourceVariable._restore_from_tensors(self, restored_tensors)
    782   assigned_variable = shape_safe_assign_variable_handle(
    783       self.handle, self.shape, restored_tensor)
    784 except ValueError as e:
--> 785   raise ValueError(
    786       f"Received incompatible tensor with shape {restored_tensor.shape} "
    787       f"when attempting to restore variable with shape {self.shape} "
    788       f"and name {self.name}.") from e
    789 return assigned_variable
ValueError: Received incompatible tensor with shape (1280, 10) when attempting to restore variable with shape (1, 1, 1152, 48) and name Adam/m/block6h_se_reduce/kernel:0.

NOTE: the model cannot be loaded from weights.
The model must be re-compiled THEN weights re-loaded.
Interesting tidbit here.

Re-Compile The Model

In [62]:

m2.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), # lr is 10x lower than before for fine-tuning
                metrics=["accuracy"])

Load From Weights

In [63]:

m2.load_weights(savedCheckpointPath)

Out [63]:

<tensorflow.python.checkpoint.checkpoint.CheckpointLoadStatus at 0xfffec42ee010>

Re-Evaluate The Model

Now that the model is "back" to is m2 state from Model II, the model should evaluate differently than the model evaluated in Model V.
After evaluating on the same data as model 5, `

In [64]:

m2.evaluate(test_data_100p)

79/79 [==============================] - 58s 691ms/step - loss: 0.5672 - accuracy: 0.8316

Out [64]:

[0.5671975016593933, 0.83160001039505]

accuarcy is 83%, loss is 56%.
model 5 acc was 85 and loss was 44.

In [66]:

print(len(m2.trainable_variables))

In [67]:

# Check which layers are NOW tuneable/trainable
for layer_number, layer in enumerate(m2.layers):
    if(layer.trainable == True):
        print(f'layer #{layer_number}, {layer.name}, is trainable')

layer #0, input_layer, is trainable
layer #1, data_augmentation, is trainable
layer #2, efficientnetv2-b0, is trainable
layer #3, global_average_pooling_layer, is trainable
layer #4, output_layer, is trainable

In [68]:

m2.layers[2].name

Out [68]:

'efficientnetv2-b0'

In [69]:

# Check which layers are NOW tuneable/trainable
for layer_number, layer in enumerate(m2.layers[2].layers):
    if(layer.trainable == True):
        print(f'layer #{layer_number}, {layer.name}, is trainable')

layer #260, block6h_se_reduce, is trainable
layer #261, block6h_se_expand, is trainable
layer #262, block6h_se_excite, is trainable
layer #263, block6h_project_conv, is trainable
layer #264, block6h_project_bn, is trainable
layer #265, block6h_drop, is trainable
layer #266, block6h_add, is trainable
layer #267, top_conv, is trainable
layer #268, top_bn, is trainable
layer #269, top_activation, is trainable

NOTICE: here, 10 layers of the efficientnetv2-b0 model are trainable.

Fit

In [71]:

m2FineTuned100P = m2.fit(train_data_100p,
                                           epochs=10,
                                           initial_epoch=m2History.epoch[-1],
                                           validation_data=test_data_100p,
                                           validation_steps=int(0.25 * len(test_data_100p)),
                                           callbacks=[create_tensorboard_callback("transfer_learning", "m2_100p_fine-tuned")])

Saving TensorBoard log files to: transfer_learning/m2_100p_fine-tuned/20240701-162425
Epoch 5/10
235/235 [==============================] - 253s 1s/step - loss: 0.6273 - accuracy: 0.7995 - val_loss: 0.2392 - val_accuracy: 0.9145
Epoch 6/10
235/235 [==============================] - 285s 1s/step - loss: 0.4206 - accuracy: 0.8616 - val_loss: 0.2400 - val_accuracy: 0.9079
Epoch 7/10
235/235 [==============================] - 198s 842ms/step - loss: 0.3488 - accuracy: 0.8860 - val_loss: 0.2556 - val_accuracy: 0.9079
Epoch 8/10
235/235 [==============================] - 157s 666ms/step - loss: 0.2840 - accuracy: 0.9057 - val_loss: 0.2407 - val_accuracy: 0.9095
Epoch 9/10
235/235 [==============================] - 164s 699ms/step - loss: 0.2433 - accuracy: 0.9227 - val_loss: 0.1976 - val_accuracy: 0.9408
Epoch 10/10
235/235 [==============================] - 149s 631ms/step - loss: 0.2089 - accuracy: 0.9335 - val_loss: 0.2300 - val_accuracy: 0.9161

Evaluate

In [72]:

m5Evaluation = m2.evaluate(test_data_100p)

79/79 [==============================] - 45s 557ms/step - loss: 0.2214 - accuracy: 0.9224

In [73]:

compare_historys(original_history=m2History,
                 new_history=m2FineTuned100P,
                 initial_epochs=5)

5
11
[0.4266666769981384, 0.7039999961853027, 0.7666666507720947, 0.7879999876022339, 0.8159999847412109, 0.7994666695594788, 0.8615999817848206, 0.8859999775886536, 0.9057333469390869, 0.9226666688919067, 0.9334666728973389]

In [1]:

# %load_ext tensorboard
# %tensorboard --logdir="transfer_learning"
# edit: needs port review w. docker....