Table Of Contents

Modeling And Wrangling
Download Some Data
Create Number-Only Data with One-Hot-Encoding
Split Data: Features & Labels
Split Data: Train & test
Create A Model
Review Model Results
Visualize & Analyze The Loss Curve
Experiment I
Build A New Model Version
Evaluate the model
Review Model Results
Visualise Model Loss
Experiment II

Modeling And Wrangling

Here, python libraries will be used to do some data wrangling prior to building a model:

Download Data from the internet: with pandas read_csv we can pass a url that returns a csv
preview downloaded data with the pandas head() method
Normalize Data Values with sklearn MinMaxScaler and OneHotEncoder
Split data into training & testing with sklearn train_test_split

In this example, some data about medical insurance will be downloaded, wrangled, and used to build a machine-learning model that can predict insurance costs based on age, sex, bmi, children, smoking_status and residential_region

In [1]:

import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split

Download Some Data

In [2]:

dataUrl = "https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv"
dataFromWeb = pd.read_csv(dataUrl)
dataFromWeb.head()

Out [2]:

	age	sex	bmi	children	smoker	region	charges
0	19	female	27.900	0	yes	southwest	16884.92400
1	18	male	33.770	1	no	southeast	1725.55230
2	28	male	33.000	3	no	southeast	4449.46200
3	33	male	22.705	0	no	northwest	21984.47061
4	32	male	28.880	0	no	northwest	3866.85520

Create Number-Only Data with One-Hot-Encoding

In [3]:

oheEncodedData = pd.get_dummies(dataFromWeb)
oheEncodedData.head()

Out [3]:

	age	bmi	children	charges	sex_female	sex_male	smoker_no	smoker_yes	region_northeast	region_northwest	region_southeast	region_southwest
0	19	27.900	0	16884.92400	True	False	False	True	False	False	False	True
1	18	33.770	1	1725.55230	False	True	True	False	False	False	True	False
2	28	33.000	3	4449.46200	False	True	True	False	False	False	True	False
3	33	22.705	0	21984.47061	False	True	True	False	False	True	False	False
4	32	28.880	0	3866.85520	False	True	True	False	False	True	False	False

Split Data: Features & Labels

The charges column represent the dependent variable here: the labels.
All the others represent independent variables, the features.

In [4]:

labelField = 'charges'
featureData = oheEncodedData.drop(labelField, axis=1)
labelData = oheEncodedData[labelField]
featureData.head()

Out [4]:

	age	bmi	children	sex_female	sex_male	smoker_no	smoker_yes	region_northeast	region_northwest	region_southeast	region_southwest
0	19	27.900	0	True	False	False	True	False	False	False	True
1	18	33.770	1	False	True	True	False	False	False	True	False
2	28	33.000	3	False	True	True	False	False	False	True	False
3	33	22.705	0	False	True	True	False	False	True	False	False
4	32	28.880	0	False	True	True	False	False	True	False	False

Split Data: Train & test

In [5]:

testDataPercentage = .2 # how much of our data should we use for "testing"
randomVal = 42
feature_training_data, feature_testing_data, label_training_data, label_testing_data = train_test_split(featureData, 
                                                    labelData, 
                                                    test_size=testDataPercentage, 
                                                    random_state=randomVal) # set random state for reproducible splits

In [6]:

feature_training_data.head()

Out [6]:

	age	bmi	children	sex_female	sex_male	smoker_no	smoker_yes	region_northeast	region_northwest	region_southeast	region_southwest
560	46	19.95	2	True	False	True	False	False	True	False	False
1285	47	24.32	0	True	False	True	False	True	False	False	False
1142	52	24.86	0	True	False	True	False	False	False	True	False
969	39	34.32	5	True	False	True	False	False	False	True	False
486	54	21.47	3	True	False	True	False	False	True	False	False

In [7]:

label_training_data.head()

Out [7]:

560      9193.83850
1285     8534.67180
1142    27117.99378
969      8596.82780
486     12475.35130
Name: charges, dtype: float64

Create A Model

In [8]:

epochCount = 100
# Set random seed
tf.random.set_seed(randomVal)

# layers
denseLayer = tf.keras.layers.Dense(1)
# Create a new model (same as model_2)
insurance_model = tf.keras.Sequential()
insurance_model.add(denseLayer)
insurance_model.add(denseLayer)

# Compile the model
insurance_model.compile(loss=tf.keras.losses.mae,
                        optimizer=tf.keras.optimizers.SGD(),
                        metrics=['mae'])

# adjust data type to prevent error 
feature_training_data=feature_training_data.astype(np.float32)
label_training_data=label_training_data.astype(np.float32)
feature_testing_data=feature_testing_data.astype(np.float32)
label_testing_data=label_testing_data.astype(np.float32)


# Fit the model
# save output to a variable
modelHistory = insurance_model.fit(feature_training_data, label_training_data, epochs=epochCount)

Epoch 1/100
34/34 [==============================] - 1s 5ms/step - loss: 12929.0977 - mae: 12929.0977
Epoch 2/100
34/34 [==============================] - 0s 5ms/step - loss: 12084.2998 - mae: 12084.2998
Epoch 3/100
34/34 [==============================] - 0s 4ms/step - loss: 11257.6836 - mae: 11257.6836
Epoch 4/100
34/34 [==============================] - 0s 5ms/step - loss: 10501.6211 - mae: 10501.6211
Epoch 5/100
34/34 [==============================] - 0s 4ms/step - loss: 9854.5127 - mae: 9854.5127
Epoch 6/100
34/34 [==============================] - 0s 4ms/step - loss: 9307.1348 - mae: 9307.1348
Epoch 7/100
34/34 [==============================] - 0s 4ms/step - loss: 8834.4043 - mae: 8834.4043
Epoch 8/100
34/34 [==============================] - 0s 4ms/step - loss: 8449.6650 - mae: 8449.6650
Epoch 9/100
34/34 [==============================] - 0s 4ms/step - loss: 8144.5552 - mae: 8144.5552
Epoch 10/100
34/34 [==============================] - 0s 4ms/step - loss: 7902.4521 - mae: 7902.4521
Epoch 11/100
34/34 [==============================] - 0s 4ms/step - loss: 7714.7314 - mae: 7714.7314
Epoch 12/100
34/34 [==============================] - 0s 4ms/step - loss: 7582.1240 - mae: 7582.1240
Epoch 13/100
34/34 [==============================] - 0s 4ms/step - loss: 7493.8091 - mae: 7493.8091
Epoch 14/100
34/34 [==============================] - 0s 4ms/step - loss: 7433.5542 - mae: 7433.5542
Epoch 15/100
34/34 [==============================] - 0s 4ms/step - loss: 7393.4766 - mae: 7393.4766
Epoch 16/100
34/34 [==============================] - 0s 5ms/step - loss: 7362.8184 - mae: 7362.8184
Epoch 17/100
34/34 [==============================] - 0s 4ms/step - loss: 7341.7515 - mae: 7341.7515
Epoch 18/100
34/34 [==============================] - 0s 5ms/step - loss: 7326.7085 - mae: 7326.7085
Epoch 19/100
34/34 [==============================] - 0s 4ms/step - loss: 7314.8198 - mae: 7314.8198
Epoch 20/100
34/34 [==============================] - 0s 5ms/step - loss: 7305.0923 - mae: 7305.0923
Epoch 21/100
34/34 [==============================] - 0s 5ms/step - loss: 7296.5908 - mae: 7296.5908
Epoch 22/100
34/34 [==============================] - 0s 6ms/step - loss: 7290.1104 - mae: 7290.1104
Epoch 23/100
34/34 [==============================] - 0s 5ms/step - loss: 7285.4897 - mae: 7285.4897
Epoch 24/100
34/34 [==============================] - 0s 5ms/step - loss: 7281.0742 - mae: 7281.0742
Epoch 25/100
34/34 [==============================] - 0s 6ms/step - loss: 7276.6904 - mae: 7276.6904
Epoch 26/100
34/34 [==============================] - 0s 5ms/step - loss: 7272.5249 - mae: 7272.5249
Epoch 27/100
34/34 [==============================] - 0s 4ms/step - loss: 7268.4888 - mae: 7268.4888
Epoch 28/100
34/34 [==============================] - 0s 4ms/step - loss: 7264.3525 - mae: 7264.3525
Epoch 29/100
34/34 [==============================] - 0s 4ms/step - loss: 7260.4541 - mae: 7260.4541
Epoch 30/100
34/34 [==============================] - 0s 4ms/step - loss: 7256.5405 - mae: 7256.5405
Epoch 31/100
34/34 [==============================] - 0s 5ms/step - loss: 7252.4370 - mae: 7252.4370
Epoch 32/100
34/34 [==============================] - 0s 5ms/step - loss: 7248.7378 - mae: 7248.7378
Epoch 33/100
34/34 [==============================] - 0s 4ms/step - loss: 7244.6558 - mae: 7244.6558
Epoch 34/100
34/34 [==============================] - 0s 5ms/step - loss: 7240.7632 - mae: 7240.7632
Epoch 35/100
34/34 [==============================] - 0s 4ms/step - loss: 7237.0293 - mae: 7237.0293
Epoch 36/100
34/34 [==============================] - 0s 4ms/step - loss: 7233.1123 - mae: 7233.1123
Epoch 37/100
34/34 [==============================] - 0s 4ms/step - loss: 7229.2168 - mae: 7229.2168
Epoch 38/100
34/34 [==============================] - 0s 4ms/step - loss: 7225.4897 - mae: 7225.4897
Epoch 39/100
34/34 [==============================] - 0s 4ms/step - loss: 7221.4600 - mae: 7221.4600
Epoch 40/100
34/34 [==============================] - 0s 4ms/step - loss: 7217.6426 - mae: 7217.6426
Epoch 41/100
34/34 [==============================] - 0s 4ms/step - loss: 7213.9087 - mae: 7213.9087
Epoch 42/100
34/34 [==============================] - 0s 4ms/step - loss: 7210.1353 - mae: 7210.1353
Epoch 43/100
34/34 [==============================] - 0s 4ms/step - loss: 7206.1602 - mae: 7206.1602
Epoch 44/100
34/34 [==============================] - 0s 4ms/step - loss: 7202.5337 - mae: 7202.5337
Epoch 45/100
34/34 [==============================] - 0s 4ms/step - loss: 7198.9209 - mae: 7198.9209
Epoch 46/100
34/34 [==============================] - 0s 4ms/step - loss: 7195.1436 - mae: 7195.1436
Epoch 47/100
34/34 [==============================] - 0s 4ms/step - loss: 7191.5684 - mae: 7191.5684
Epoch 48/100
34/34 [==============================] - 0s 4ms/step - loss: 7187.6177 - mae: 7187.6177
Epoch 49/100
34/34 [==============================] - 0s 4ms/step - loss: 7184.2520 - mae: 7184.2520
Epoch 50/100
34/34 [==============================] - 0s 4ms/step - loss: 7180.5498 - mae: 7180.5498
Epoch 51/100
34/34 [==============================] - 0s 4ms/step - loss: 7176.8579 - mae: 7176.8579
Epoch 52/100
34/34 [==============================] - 0s 4ms/step - loss: 7173.0317 - mae: 7173.0317
Epoch 53/100
34/34 [==============================] - 0s 4ms/step - loss: 7169.5488 - mae: 7169.5488
Epoch 54/100
34/34 [==============================] - 0s 4ms/step - loss: 7165.8984 - mae: 7165.8984
Epoch 55/100
34/34 [==============================] - 0s 4ms/step - loss: 7162.1387 - mae: 7162.1387
Epoch 56/100
34/34 [==============================] - 0s 5ms/step - loss: 7158.6626 - mae: 7158.6626
Epoch 57/100
34/34 [==============================] - 0s 5ms/step - loss: 7155.1860 - mae: 7155.1860
Epoch 58/100
34/34 [==============================] - 0s 5ms/step - loss: 7151.6074 - mae: 7151.6074
Epoch 59/100
34/34 [==============================] - 0s 5ms/step - loss: 7148.1851 - mae: 7148.1851
Epoch 60/100
34/34 [==============================] - 0s 6ms/step - loss: 7144.7017 - mae: 7144.7017
Epoch 61/100
34/34 [==============================] - 0s 5ms/step - loss: 7141.2495 - mae: 7141.2495
Epoch 62/100
34/34 [==============================] - 0s 4ms/step - loss: 7137.6250 - mae: 7137.6250
Epoch 63/100
34/34 [==============================] - 0s 4ms/step - loss: 7134.3550 - mae: 7134.3550
Epoch 64/100
34/34 [==============================] - 0s 5ms/step - loss: 7131.1562 - mae: 7131.1562
Epoch 65/100
34/34 [==============================] - 0s 5ms/step - loss: 7127.7969 - mae: 7127.7969
Epoch 66/100
34/34 [==============================] - 0s 4ms/step - loss: 7124.3398 - mae: 7124.3398
Epoch 67/100
34/34 [==============================] - 0s 5ms/step - loss: 7121.2031 - mae: 7121.2031
Epoch 68/100
34/34 [==============================] - 0s 4ms/step - loss: 7117.9922 - mae: 7117.9922
Epoch 69/100
34/34 [==============================] - 0s 4ms/step - loss: 7114.6816 - mae: 7114.6816
Epoch 70/100
34/34 [==============================] - 0s 5ms/step - loss: 7111.5186 - mae: 7111.5186
Epoch 71/100
34/34 [==============================] - 0s 5ms/step - loss: 7108.1860 - mae: 7108.1860
Epoch 72/100
34/34 [==============================] - 0s 5ms/step - loss: 7105.2412 - mae: 7105.2412
Epoch 73/100
34/34 [==============================] - 0s 5ms/step - loss: 7101.9375 - mae: 7101.9375
Epoch 74/100
34/34 [==============================] - 0s 5ms/step - loss: 7098.5718 - mae: 7098.5718
Epoch 75/100
34/34 [==============================] - 0s 4ms/step - loss: 7095.4531 - mae: 7095.4531
Epoch 76/100
34/34 [==============================] - 0s 4ms/step - loss: 7092.1846 - mae: 7092.1846
Epoch 77/100
34/34 [==============================] - 0s 4ms/step - loss: 7089.0986 - mae: 7089.0986
Epoch 78/100
34/34 [==============================] - 0s 4ms/step - loss: 7086.0303 - mae: 7086.0303
Epoch 79/100
34/34 [==============================] - 0s 4ms/step - loss: 7083.0830 - mae: 7083.0830
Epoch 80/100
34/34 [==============================] - 0s 4ms/step - loss: 7079.7832 - mae: 7079.7832
Epoch 81/100
34/34 [==============================] - 0s 4ms/step - loss: 7076.8062 - mae: 7076.8062
Epoch 82/100
34/34 [==============================] - 0s 4ms/step - loss: 7074.0054 - mae: 7074.0054
Epoch 83/100
34/34 [==============================] - 0s 4ms/step - loss: 7071.2134 - mae: 7071.2134
Epoch 84/100
34/34 [==============================] - 0s 4ms/step - loss: 7067.8643 - mae: 7067.8643
Epoch 85/100
34/34 [==============================] - 0s 4ms/step - loss: 7065.1138 - mae: 7065.1138
Epoch 86/100
34/34 [==============================] - 0s 4ms/step - loss: 7062.0625 - mae: 7062.0625
Epoch 87/100
34/34 [==============================] - 0s 5ms/step - loss: 7059.3682 - mae: 7059.3682
Epoch 88/100
34/34 [==============================] - 0s 5ms/step - loss: 7056.2017 - mae: 7056.2017
Epoch 89/100
34/34 [==============================] - 0s 4ms/step - loss: 7053.3081 - mae: 7053.3081
Epoch 90/100
34/34 [==============================] - 0s 4ms/step - loss: 7050.1855 - mae: 7050.1855
Epoch 91/100
34/34 [==============================] - 0s 5ms/step - loss: 7047.3662 - mae: 7047.3662
Epoch 92/100
34/34 [==============================] - 0s 6ms/step - loss: 7044.6016 - mae: 7044.6016
Epoch 93/100
34/34 [==============================] - 0s 6ms/step - loss: 7041.6235 - mae: 7041.6235
Epoch 94/100
34/34 [==============================] - 0s 6ms/step - loss: 7038.6577 - mae: 7038.6577
Epoch 95/100
34/34 [==============================] - 0s 6ms/step - loss: 7035.6338 - mae: 7035.6338
Epoch 96/100
34/34 [==============================] - 0s 5ms/step - loss: 7032.9272 - mae: 7032.9272
Epoch 97/100
34/34 [==============================] - 0s 4ms/step - loss: 7030.1157 - mae: 7030.1157
Epoch 98/100
34/34 [==============================] - 0s 4ms/step - loss: 7027.0903 - mae: 7027.0903
Epoch 99/100
34/34 [==============================] - 0s 4ms/step - loss: 7024.1489 - mae: 7024.1489
Epoch 100/100
34/34 [==============================] - 0s 4ms/step - loss: 7021.0322 - mae: 7021.0322

Review Model Results

In [9]:

# Check the results of the insurance model
insurance_model.evaluate(feature_testing_data, label_testing_data)

9/9 [==============================] - 0s 5ms/step - loss: 7002.0923 - mae: 7002.0923

Out [9]:

[7002.09228515625, 7002.09228515625]

In [10]:

print(f'Training Label Median: {label_training_data.median()}')
print(f'Training Label Mean: {label_training_data.mean()}')
print(f'model MAE: {insurance_model.get_metrics_result()["mae"].numpy()}')

Training Label Median: 9575.4423828125
Training Label Mean: 13346.08984375
model MAE: 7002.09228515625

Because the MAE (mean absolute error) is so "large", the model is not great.

Visualize & Analyze The Loss Curve

In [11]:

pd.DataFrame(modelHistory.history).plot()
plt.ylabel('loss')
plt.xlabel('epochs')

Out [11]:

Text(0.5, 0, 'epochs')

The loss score took a large drop toward the beginning of the epochs.
The loss curve "slowed down", and seems to be still dropping toward the end.

Experiment I

different layers
different optimizer fn

Build A New Model Version

In [12]:

insurance_model_2 = tf.keras.Sequential()
modell2EpochCount = 100
# different & more layers
l1 = tf.keras.layers.Dense(100)
l2 = tf.keras.layers.Dense(10)
l3 = tf.keras.layers.Dense(1)

insurance_model_2.add(l1)
insurance_model_2.add(l2)
insurance_model_2.add(l3)

# Compile the model
insurance_model_2.compile(loss=tf.keras.losses.mae,
                          optimizer=tf.keras.optimizers.Adam(), # Adam works but SGD doesn't 
                          metrics=['mae'])

# Fit the model and save the history (we can plot this)
model_2_history = insurance_model_2.fit(feature_training_data, label_training_data, epochs=modell2EpochCount, verbose=0)

Evaluate the model

In [13]:

insurance_model_2.evaluate(feature_testing_data, label_testing_data)

9/9 [==============================] - 0s 5ms/step - loss: 4758.9893 - mae: 4758.9893

Out [13]:

[4758.9892578125, 4758.9892578125]

Review Model Results

In [14]:

print(f'Training Label Median: {label_training_data.median()}')
print(f'Training Label Mean: {label_training_data.mean()}')
print(f'model_2 MAE: {insurance_model_2.get_metrics_result()["mae"].numpy()}')

Training Label Median: 9575.4423828125
Training Label Mean: 13346.08984375
model_2 MAE: 4758.9892578125

Visualise Model Loss

In [15]:

pd.DataFrame(model_2_history.history).plot()
plt.ylabel('loss')
plt.xlabel('epochs')

Out [15]:

Text(0.5, 0, 'epochs')

Experiment II

In [16]:

insurance_model_3 = tf.keras.Sequential()
model3EpochCount = 200

insurance_model_3.add(l1)
insurance_model_3.add(l2)
insurance_model_3.add(l3)

# Compile the model
insurance_model_3.compile(loss=tf.keras.losses.mae,
                          optimizer=tf.keras.optimizers.Adam(), # Adam works but SGD doesn't 
                          metrics=['mae'])

# Fit the model and save the history (we can plot this)
model_3_history = insurance_model_3.fit(feature_training_data, label_training_data, epochs=model3EpochCount, verbose=0)

In [17]:

insurance_model_2.evaluate(feature_testing_data, label_testing_data)
print(f'Training Label Median: {label_training_data.median()}')
print(f'Training Label Mean: {label_training_data.mean()}')
print(f'model_3 MAE: {insurance_model_3.get_metrics_result()["mae"].numpy()}')

9/9 [==============================] - 0s 5ms/step - loss: 3230.5137 - mae: 3230.5137
Training Label Median: 9575.4423828125
Training Label Mean: 13346.08984375
model_3 MAE: 3515.3447265625

In [18]:

pd.DataFrame(model_3_history.history).plot()
plt.ylabel('loss')
plt.xlabel('epochs')

Out [18]:

Text(0.5, 0, 'epochs')