Multi-class Classification

python
deep learning.ai
machine learning
supervised learning
logistic regression
linear regression
neural network
Author

kakamana

Published

April 30, 2023

SMulti-class Classification

This notebook wille explore neural network with multi class classification

This Multi-class Classification is part of DeepLearning.AI course: Machine Learning Specialization / Course 2: Advanced Learning Algorithms: Multiclass classification You will build and train a neural network with TensorFlow to perform multi-class classification in the second course of the Machine Learning Specialization. Ensure that your machine learning models are generalizable by applying best practices for machine learning development. You will build and train a neural network using TensorFlow to perform multi-class classification in the second course of the Machine Learning Specialization. Implement best practices for machine learning development to ensure that your models are generalizable to real-world data and tasks. Create and use decision trees and tree ensemble methods, including random forests and boosted trees.

This is my learning experience of data science through DeepLearning.AI. These repository contributions are part of my learning journey through my graduate program masters of applied data sciences (MADS) at University Of Michigan, DeepLearning.AI, Coursera & DataCamp. You can find my similar articles & more stories at my medium & LinkedIn profile. I am available at kaggle & github blogs & github repos. Thank you for your motivation, support & valuable feedback.

These include projects, coursework & notebook which I learned through my data science journey. They are created for reproducible & future reference purpose only. All source code, slides or screenshot are intellectual property of respective content authors. If you find these contents beneficial, kindly consider learning subscription from DeepLearning.AI Subscription, Coursera, DataCamp

Optional Lab - Multi-class Classification

1.1 Goals

In this lab, you will explore an example of multi-class classification using neural networks.

1.2 Tools

You will use some plotting routines. These are stored in lab_utils_multiclass_TF.py in this directory.

Code
import numpy as np
import matplotlib.pyplot as plt
%matplotlib widget
from sklearn.datasets import make_blobs
import tensorflow as tf
devices = tf.config.list_physical_devices()
print(devices)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
np.set_printoptions(precision=2)
from lab_utils_multiclass_TF import *
import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)
tf.autograph.set_verbosity(0)
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Code
# Specify the GPU device to use
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Set the GPU memory growth to True
  try:
    tf.config.experimental.set_memory_growth(gpus[0], True)
  except RuntimeError as e:
    print(e)

2.0 Multi-class Classification

Neural Networks are often used to classify data. Examples are neural networks: - take in photos and classify subjects in the photos as {dog,cat,horse,other} - take in a sentence and classify the ‘parts of speech’ of its elements: {noun, verb, adjective etc..}

A network of this type will have multiple units in its final layer. Each output is associated with a category. When an input example is applied to the network, the output with the highest value is the category predicted. If the output is applied to a softmax function, the output of the softmax will provide probabilities of the input being in each category.

In this lab you will see an example of building a multiclass network in Tensorflow. We will then take a look at how the neural network makes its predictions.

Let’s start by creating a four-class data set.

2.1 Prepare and visualize our data

We will use Scikit-Learn make_blobs function to make a training data set with 4 categories as shown in the plot below.

Code
# make 4-class dataset for classification
classes = 4
m = 100
centers = [[-5, 2], [-2, -2], [1, 2], [5, -2]]
std = 1.0
X_train, y_train = make_blobs(n_samples=m, centers=centers, cluster_std=std,random_state=30)
Code
plt_mc(X_train,y_train,classes, centers, std=std)
/Users/kakamana/Library/CloudStorage/OneDrive-Personal/Datascience/Journey/DeepLearning.ai/Machine Learning Specialization/Advanced Learning Algorithms/lab_utils_multiclass_TF.py:63: UserWarning: No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored
  ax.scatter(X[idx, 0], X[idx, 1],  marker=m,

Each dot represents a training example. The axis (x0,x1) are the inputs and the color represents the class the example is associated with. Once trained, the model will be presented with a new example, (x0,x1), and will predict the class.

While generated, this data set is representative of many real-world classification problems. There are several input features (x0,…,xn) and several output categories. The model is trained to use the input features to predict the correct output category.

Code
# show classes in data set
print(f"unique classes {np.unique(y_train)}")
# show how classes are represented
print(f"class representation {y_train[:10]}")
# show shapes of our dataset
print(f"shape of X_train: {X_train.shape}, shape of y_train: {y_train.shape}")
unique classes [0 1 2 3]
class representation [3 3 3 0 3 3 3 3 2 0]
shape of X_train: (100, 2), shape of y_train: (100,)

2.2 Model

This lab will use a 2-layer network as shown. Unlike the binary classification networks, this network has four outputs, one for each class. Given an input example, the output with the highest value is the predicted class of the input.

Below is an example of how to construct this network in Tensorflow. Notice the output layer uses a linear rather than a softmax activation. While it is possible to include the softmax in the output layer, it is more numerically stable if linear outputs are passed to the loss function during training. If the model is used to predict probabilities, the softmax can be applied at that point.

Code
tf.random.set_seed(1234)  # applied to achieve consistent results
model = Sequential(
    [
        Dense(2, activation = 'relu',   name = "L1"),
        Dense(4, activation = 'linear', name = "L2")
    ]
)
Metal device set to: Apple M2 Pro

The statements below compile and train the network. Setting from_logits=True as an argument to the loss function specifies that the output activation was linear rather than a softmax.

Code
model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=tf.keras.optimizers.Adam(0.01),
)

model.fit(
    X_train,y_train,
    epochs=200
)
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
2023-04-30 19:41:08.828617: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
Epoch 1/200
4/4 [==============================] - 2s 13ms/step - loss: 2.7783
Epoch 2/200
4/4 [==============================] - 0s 5ms/step - loss: 2.5076
Epoch 3/200
4/4 [==============================] - 0s 4ms/step - loss: 2.2742
Epoch 4/200
4/4 [==============================] - 0s 4ms/step - loss: 2.0804
Epoch 5/200
4/4 [==============================] - 0s 13ms/step - loss: 1.9146
Epoch 6/200
4/4 [==============================] - 0s 5ms/step - loss: 1.7571
Epoch 7/200
4/4 [==============================] - 0s 3ms/step - loss: 1.6304
Epoch 8/200
4/4 [==============================] - 0s 10ms/step - loss: 1.5166
Epoch 9/200
4/4 [==============================] - 0s 3ms/step - loss: 1.4196
Epoch 10/200
4/4 [==============================] - 0s 3ms/step - loss: 1.3163
Epoch 11/200
4/4 [==============================] - 0s 3ms/step - loss: 1.2404
Epoch 12/200
4/4 [==============================] - 0s 3ms/step - loss: 1.1712
Epoch 13/200
4/4 [==============================] - 0s 3ms/step - loss: 1.1148
Epoch 14/200
4/4 [==============================] - 0s 3ms/step - loss: 1.0667
Epoch 15/200
4/4 [==============================] - 0s 3ms/step - loss: 1.0250
Epoch 16/200
4/4 [==============================] - 0s 3ms/step - loss: 0.9934
Epoch 17/200
4/4 [==============================] - 0s 3ms/step - loss: 0.9623
Epoch 18/200
4/4 [==============================] - 0s 3ms/step - loss: 0.9328
Epoch 19/200
4/4 [==============================] - 0s 3ms/step - loss: 0.9064
Epoch 20/200
4/4 [==============================] - 0s 3ms/step - loss: 0.8810
Epoch 21/200
4/4 [==============================] - 0s 3ms/step - loss: 0.8571
Epoch 22/200
4/4 [==============================] - 0s 3ms/step - loss: 0.8336
Epoch 23/200
4/4 [==============================] - 0s 3ms/step - loss: 0.8117
Epoch 24/200
4/4 [==============================] - 0s 3ms/step - loss: 0.7913
Epoch 25/200
4/4 [==============================] - 0s 3ms/step - loss: 0.7713
Epoch 26/200
4/4 [==============================] - 0s 4ms/step - loss: 0.7534
Epoch 27/200
4/4 [==============================] - 0s 4ms/step - loss: 0.7356
Epoch 28/200
4/4 [==============================] - 0s 4ms/step - loss: 0.7193
Epoch 29/200
4/4 [==============================] - 0s 4ms/step - loss: 0.7038
Epoch 30/200
4/4 [==============================] - 0s 4ms/step - loss: 0.6893
Epoch 31/200
4/4 [==============================] - 0s 3ms/step - loss: 0.6758
Epoch 32/200
4/4 [==============================] - 0s 4ms/step - loss: 0.6631
Epoch 33/200
4/4 [==============================] - 0s 3ms/step - loss: 0.6506
Epoch 34/200
4/4 [==============================] - 0s 4ms/step - loss: 0.6388
Epoch 35/200
4/4 [==============================] - 0s 4ms/step - loss: 0.6280
Epoch 36/200
4/4 [==============================] - 0s 4ms/step - loss: 0.6172
Epoch 37/200
4/4 [==============================] - 0s 4ms/step - loss: 0.6075
Epoch 38/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5978
Epoch 39/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5890
Epoch 40/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5803
Epoch 41/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5724
Epoch 42/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5648
Epoch 43/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5579
Epoch 44/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5509
Epoch 45/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5445
Epoch 46/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5381
Epoch 47/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5318
Epoch 48/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5259
Epoch 49/200
4/4 [==============================] - 0s 3ms/step - loss: 0.5205
Epoch 50/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5153
Epoch 51/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5108
Epoch 52/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5060
Epoch 53/200
4/4 [==============================] - 0s 4ms/step - loss: 0.5016
Epoch 54/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4968
Epoch 55/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4926
Epoch 56/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4882
Epoch 57/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4842
Epoch 58/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4802
Epoch 59/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4767
Epoch 60/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4731
Epoch 61/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4700
Epoch 62/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4666
Epoch 63/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4636
Epoch 64/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4604
Epoch 65/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4574
Epoch 66/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4549
Epoch 67/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4520
Epoch 68/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4494
Epoch 69/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4469
Epoch 70/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4444
Epoch 71/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4421
Epoch 72/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4401
Epoch 73/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4380
Epoch 74/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4357
Epoch 75/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4336
Epoch 76/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4316
Epoch 77/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4297
Epoch 78/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4277
Epoch 79/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4259
Epoch 80/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4241
Epoch 81/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4224
Epoch 82/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4204
Epoch 83/200
4/4 [==============================] - 0s 31ms/step - loss: 0.4187
Epoch 84/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4168
Epoch 85/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4149
Epoch 86/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4134
Epoch 87/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4115
Epoch 88/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4096
Epoch 89/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4078
Epoch 90/200
4/4 [==============================] - 0s 4ms/step - loss: 0.4062
Epoch 91/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4044
Epoch 92/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4026
Epoch 93/200
4/4 [==============================] - 0s 3ms/step - loss: 0.4009
Epoch 94/200
4/4 [==============================] - 0s 4ms/step - loss: 0.3985
Epoch 95/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3962
Epoch 96/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3942
Epoch 97/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3912
Epoch 98/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3890
Epoch 99/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3858
Epoch 100/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3831
Epoch 101/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3804
Epoch 102/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3770
Epoch 103/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3732
Epoch 104/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3692
Epoch 105/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3647
Epoch 106/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3591
Epoch 107/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3536
Epoch 108/200
4/4 [==============================] - 0s 4ms/step - loss: 0.3474
Epoch 109/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3418
Epoch 110/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3360
Epoch 111/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3298
Epoch 112/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3226
Epoch 113/200
4/4 [==============================] - 0s 3ms/step - loss: 0.3158
Epoch 114/200
4/4 [==============================] - 0s 4ms/step - loss: 0.3085
Epoch 115/200
4/4 [==============================] - 0s 3ms/step - loss: 0.2998
Epoch 116/200
4/4 [==============================] - 0s 4ms/step - loss: 0.2921
Epoch 117/200
4/4 [==============================] - 0s 3ms/step - loss: 0.2840
Epoch 118/200
4/4 [==============================] - 0s 4ms/step - loss: 0.2762
Epoch 119/200
4/4 [==============================] - 0s 4ms/step - loss: 0.2677
Epoch 120/200
4/4 [==============================] - 0s 4ms/step - loss: 0.2598
Epoch 121/200
4/4 [==============================] - 0s 4ms/step - loss: 0.2510
Epoch 122/200
4/4 [==============================] - 0s 4ms/step - loss: 0.2421
Epoch 123/200
4/4 [==============================] - 0s 4ms/step - loss: 0.2341
Epoch 124/200
4/4 [==============================] - 0s 4ms/step - loss: 0.2262
Epoch 125/200
4/4 [==============================] - 0s 3ms/step - loss: 0.2189
Epoch 126/200
4/4 [==============================] - 0s 3ms/step - loss: 0.2123
Epoch 127/200
4/4 [==============================] - 0s 4ms/step - loss: 0.2059
Epoch 128/200
4/4 [==============================] - 0s 3ms/step - loss: 0.2004
Epoch 129/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1941
Epoch 130/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1877
Epoch 131/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1821
Epoch 132/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1762
Epoch 133/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1713
Epoch 134/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1668
Epoch 135/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1626
Epoch 136/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1570
Epoch 137/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1523
Epoch 138/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1474
Epoch 139/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1423
Epoch 140/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1370
Epoch 141/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1325
Epoch 142/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1282
Epoch 143/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1241
Epoch 144/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1205
Epoch 145/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1168
Epoch 146/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1134
Epoch 147/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1103
Epoch 148/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1074
Epoch 149/200
4/4 [==============================] - 0s 4ms/step - loss: 0.1045
Epoch 150/200
4/4 [==============================] - 0s 3ms/step - loss: 0.1018
Epoch 151/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0991
Epoch 152/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0967
Epoch 153/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0941
Epoch 154/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0916
Epoch 155/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0889
Epoch 156/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0865
Epoch 157/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0843
Epoch 158/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0823
Epoch 159/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0805
Epoch 160/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0788
Epoch 161/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0769
Epoch 162/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0750
Epoch 163/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0734
Epoch 164/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0716
Epoch 165/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0702
Epoch 166/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0686
Epoch 167/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0672
Epoch 168/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0659
Epoch 169/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0645
Epoch 170/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0632
Epoch 171/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0621
Epoch 172/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0610
Epoch 173/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0599
Epoch 174/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0588
Epoch 175/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0578
Epoch 176/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0567
Epoch 177/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0558
Epoch 178/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0548
Epoch 179/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0542
Epoch 180/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0531
Epoch 181/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0523
Epoch 182/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0516
Epoch 183/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0508
Epoch 184/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0500
Epoch 185/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0492
Epoch 186/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0484
Epoch 187/200
4/4 [==============================] - 0s 6ms/step - loss: 0.0478
Epoch 188/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0470
Epoch 189/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0463
Epoch 190/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0458
Epoch 191/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0452
Epoch 192/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0447
Epoch 193/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0440
Epoch 194/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0434
Epoch 195/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0428
Epoch 196/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0423
Epoch 197/200
4/4 [==============================] - 0s 4ms/step - loss: 0.0416
Epoch 198/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0410
Epoch 199/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0404
Epoch 200/200
4/4 [==============================] - 0s 3ms/step - loss: 0.0398
<keras.callbacks.History at 0x2a8a2efa0>

With the model trained, we can see how the model has classified the training data.

Code
plt_cat_mc(X_train, y_train, model, classes)
184/184 [==============================] - 0s 2ms/step

Above, the decision boundaries show how the model has partitioned the input space. This very simple model has had no trouble classifying the training data. How did it accomplish this? Let’s look at the network in more detail.

Below, we will pull the trained weights from the model and use that to plot the function of each of the network units. Further down, there is a more detailed explanation of the results. You don’t need to know these details to successfully use neural networks, but it may be helpful to gain more intuition about how the layers combine to solve a classification problem.

Code
# gather the trained parameters from the first layer
l1 = model.get_layer("L1")
W1,b1 = l1.get_weights()
Code
# plot the function of the first layer
plt_layer_relu(X_train, y_train.reshape(-1,), W1, b1, classes)
/Users/kakamana/Library/CloudStorage/OneDrive-Personal/Datascience/Journey/DeepLearning.ai/Machine Learning Specialization/Advanced Learning Algorithms/lab_utils_multiclass_TF.py:63: UserWarning: No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored
  ax.scatter(X[idx, 0], X[idx, 1],  marker=m,
/Users/kakamana/Library/CloudStorage/OneDrive-Personal/Datascience/Journey/DeepLearning.ai/Machine Learning Specialization/Advanced Learning Algorithms/lab_utils_multiclass_TF.py:63: UserWarning: No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored
  ax.scatter(X[idx, 0], X[idx, 1],  marker=m,
Code
# gather the trained parameters from the output layer
l2 = model.get_layer("L2")
W2, b2 = l2.get_weights()
# create the 'new features', the training examples after L1 transformation
Xl2 = np.maximum(0, np.dot(X_train,W1) + b1)

plt_output_layer_linear(Xl2, y_train.reshape(-1,), W2, b2, classes,
                        x0_rng = (-0.25,np.amax(Xl2[:,0])), x1_rng = (-0.25,np.amax(Xl2[:,1])))
/Users/kakamana/Library/CloudStorage/OneDrive-Personal/Datascience/Journey/DeepLearning.ai/Machine Learning Specialization/Advanced Learning Algorithms/lab_utils_multiclass_TF.py:63: UserWarning: No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored
  ax.scatter(X[idx, 0], X[idx, 1],  marker=m,
/Users/kakamana/Library/CloudStorage/OneDrive-Personal/Datascience/Journey/DeepLearning.ai/Machine Learning Specialization/Advanced Learning Algorithms/lab_utils_multiclass_TF.py:63: UserWarning: No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored
  ax.scatter(X[idx, 0], X[idx, 1],  marker=m,
/Users/kakamana/Library/CloudStorage/OneDrive-Personal/Datascience/Journey/DeepLearning.ai/Machine Learning Specialization/Advanced Learning Algorithms/lab_utils_multiclass_TF.py:63: UserWarning: No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored
  ax.scatter(X[idx, 0], X[idx, 1],  marker=m,
/Users/kakamana/Library/CloudStorage/OneDrive-Personal/Datascience/Journey/DeepLearning.ai/Machine Learning Specialization/Advanced Learning Algorithms/lab_utils_multiclass_TF.py:63: UserWarning: No data for colormapping provided via 'c'. Parameters 'vmin', 'vmax' will be ignored
  ax.scatter(X[idx, 0], X[idx, 1],  marker=m,

Explanation

Layer 1

These plots show the function of Units 0 and 1 in the first layer of the network. The inputs are (\(x_0,x_1\)) on the axis. The output of the unit is represented by the color of the background. This is indicated by the color bar on the right of each graph. Notice that since these units are using a ReLu, the outputs do not necessarily fall between 0 and 1 and in this case are greater than 20 at their peaks. The contour lines in this graph show the transition point between the output, \(a^{[1]}_j\) being zero and non-zero. Recall the graph for a ReLu : The contour line in the graph is the inflection point in the ReLu.

Unit 0 has separated classes 0 and 1 from classes 2 and 3. Points to the left of the line (classes 0 and 1) will output zero, while points to the right will output a value greater than zero. Unit 1 has separated classes 0 and 2 from classes 1 and 3. Points above the line (classes 0 and 2 ) will output a zero, while points below will output a value greater than zero. Let’s see how this works out in the next layer!

Layer 2, the output layer

The dots in these graphs are the training examples translated by the first layer. One way to think of this is the first layer has created a new set of features for evaluation by the 2nd layer. The axes in these plots are the outputs of the previous layer \(a^{[1]}_0\) and \(a^{[1]}_1\). As predicted above, classes 0 and 1 (blue and green) have \(a^{[1]}_0 = 0\) while classes 0 and 2 (blue and orange) have \(a^{[1]}_1 = 0\). Once again, the intensity of the background color indicates the highest values. Unit 0 will produce its maximum value for values near (0,0), where class 0 (blue) has been mapped. Unit 1 produces its highest values in the upper left corner selecting class 1 (green). Unit 2 targets the lower right corner where class 2 (orange) resides. Unit 3 produces its highest values in the upper right selecting our final class (purple).

One other aspect that is not obvious from the graphs is that the values have been coordinated between the units. It is not sufficient for a unit to produce a maximum value for the class it is selecting for, it must also be the highest value of all the units for points in that class. This is done by the implied softmax function that is part of the loss function (SparseCategoricalCrossEntropy). Unlike other activation functions, the softmax works across all the outputs.

You can successfully use neural networks without knowing the details of what each unit is up to. Hopefully, this example has provided some intuition about what is happening under the hood.