Using Modern AI and Python to Detect Focal Points of a Face

Ibrahim Element
Jun 17, 2022
11 min read

Now let's be honest, we've all heard about AI, either from television or through a friend - but what exactly is AI (Artificial Intelligence)? If you're a regular user of a computer, you most likely interact with AI on a daily basis. Ranging from browsing YouTube, which generates video suggestions based on your activity, to searching for something on Google, which generates search results based on thousands of indicators. In this article, we're going to plunge head-first into AI and become more familiar with what AI is, where it is used, as well as develop a practical implementation using AI.

*Figure 1: Artificial Neural Network Topology

What Can you Expect in this Article?

https://www.youtube.com/watch?v=EsBahiFaYcY

In this article I am going to go over artificial intelligence from an abstract point of view and how it relates to the world, as well as the power it holders for software developers interested in this topic. I will speak about the differences between machine learning and deep-learning, the concept of AI and its origins, and finally - an actual implementation of AI using Python to solve a problem.

I highly suggest that readers of this article watch the videos posted as there is a lot of information to consume, and I have included additional explanations for those who are interested. My practical implementation which I included in later sections of this article is using AI to develop a model which can accurately detect focal points of a face (nose, eyes, mouth, ears). Now, let's begin.

What Exactly is Artificial Intelligence?

Before we begin to explore and build out a practical implementation using AI, I want to first introduce readers to some of the critical concepts and terminologies that are crucial to anyone who wishes to possess an enhanced understanding of AI. First and foremost, AI is intelligence demonstrated by machines - as opposed to natural intelligence displayed by humans. The term has been used to describe machines that mimic and display "human" cognitive skills that are associated with the human mind, such as "learning" and "problem solving". Contrary to popular understanding, this terminology has been rejected by experts in the field as they claim that it holds a very narrow description of what AI can actually do...

By the 1950's, there were two major approaches to creating machine intelligence, one was a heuristic approach which sought to model out the problem domain, however, the second idea was to learn from how the human mind works! The human brain has over 100 billion neurons all of-which are all connected to each-other through various path-ways. They communicate with each-other through electrical signals and by chemical interactions; much of how the brain works is still a referred to as a "blackbox" - otherwise unknown.

*Figure 2: Common terminologies and the relevance between each-other.

Machine Learning	With limited programming, developers aim to teach AI models how to solve a specific task given human-labeled data-sets.
Deep Learning	Given existing models and huge amount of data-sets, developers aim to teach an AI model to think like a human and generate results that would satisfy human needs.
ANN	Artificial Neural Network, is a digitized replication or an attempt to mimic the human brain's neuron using programming languages. We link these artificial neurons to create a neural network.
Neural Network	Layers of neurons and the relationships between each layer, as well as how each connection is made between all neurons in the tree.
ANM	Artificial Neuron Model, is a model of a singular programmed artificial neuron which we link to compose a neural network.

The concept of mimicking the human brain is the vision that we will be exploring in-depth, as it is the most popular approach used to day. We will be exploring exactly how these artificial neurons are created in later sections of this article!

Machine-Learning, Neurons, isn't this stuff code anyways?

Asking this question to ourselves and exploring how to answer it is actually quite an insightful process. I am now going to conduct a thought-experiment with you, and I hope you follow along! What is the point of all this hype around AI - isn't this all just code written by some software developer anyways? Yes... but no! It takes a highly educated person thousands of hours to enter the field of AI because of how complex the nature of this concept is.

We previously talked about how AI aims to mimic human intelligence, now, we are going to explore what this actually means in the realm of programming as well as the impact on the world. I'm going to ask readers of this article, If you have an IPhone, pick up your phone, and say:

"Hey Siri, is it going to rain today?"

Now, regardless of how you talk to your phone. The differences in tonality (screaming at your phone, whispering to your phone), accent (British, American, South-African, Indian, etc.) - your phone understood what you said and correctly responded to you. How does that work exactly?

Without using AI, the only way to accomplish this would be for a group of programmers to record the frequencies of sound and correctly detect syllables used, then piece each syllable together and map that to a word used in a sentence. This task is impossible to do at a large scale because of how sophisticated the problem is; think about it, human language and communication is such a complex problem and for a programmer to statically define what words are would take billions of lines of code! Now, if you have written any code - you will know how frustrating it is resolve a bug in your code, it can take hours to identify the problem - imagine doing this for a billion lines of code; it's impossible!

This is precisely why we need to give computers the ability to mimic human intelligence. We have solved this problem with AI by defining the characteristics and components of human speech (sound frequencies, tonality, words) and created multiple ANNs. Then we feed it human data (sound bytes of people talking, alongside the actual transcript of what was said). Using the above, we then train the model so that the AI can LEARN by itself how to convert sound-waves to actual human words!

AI is such a revolutionary concept that has already begun to improve the world in so many different fields. It is already being used to create new pharmaceutical compounds to cure illnesses. Detect tumors in the brain and body instantly, even during early stages of development. Consume huge amounts of data and produce accurate reports which would take a human work-force thousands of years to do so! It is acting as a catalyst in almost every field of science available to date (physics, chemistry, mathematics, etc.) - and we should all be excited on the fruits that it will bear for human advancement and our quality of life.

As a matter of fact, when it comes to narrow AI (task-specific challenges) - computers are so much better at accomplishing those tasks over humans. However, that goes without saying that humans are better at general functions over machines.

Now... the Technical Stuff!

To preface this section before we dive in, there is some math involved with this article - however, we will not be going too into-detail on that math. What is presented is actually quite simple and almost everyone will be able to understand what's going on. I only ask that you give it a chance without skipping over it. The purpose of this section is to highlight the variables associated to the model and achieving our end goal. All images included in this article can be expanded to full-screen at will.

https://www.youtube.com/watch?v=82HoK2Bj9l4

* Figure 3: Human brain vs Artificial neurons

Inside the human brain, we have over 100 billion neurons, each neuron collects signals from an input channel, named dendrites, then it process information in its nucleus, and finally it generates an output signal in a long thin branch called axon. Now let's take a closer look at the artificial neuron:

* Figure 4: Single artificial neuron model

This is actually a very simple explanation of an artificial neuron, when it comes to developing a single neuron model we do the following:

We multiply the inputs (X1, X2, X3) by their respective weights (W1, W2, W3).
Then we add a bias signal (b); think of the bias signal as a value which can help us change the outcome of the function. We add all of these values together and pass it off to an activation function.
There are several activation functions, each function takes the input from the previous step and performs some calculation and passed the value off.

Let's see that math process in action using a visualization!

* Figure 5: Implementation of an artificial neuron

Thinking beyond the model of a singular neuron, we have just answered what is meant when we say Artificial Neuron, however, what about the network part of the statement "Artificial Neural Network"? Similar to a human brain, each neuron is connected to another neuron.

Layer	A group of neurons which exist on the same level (X-Axis).
Dense Network	When every neuron from one layer, are connected to every neuron on the subsequent layer.

* Figure 6: Multi-Layer Perceptron Network (Dense Network)

As referenced above, here is a great illustration of a dense neural network, each neuron in a layer is connected to every neuron in the subsequent layer. Upon expanding the provided image, to solve the equation provided - we multiply the inputs (x) by the weights (w), sum them up, then we apply the activation function (sigmoid). This generates an output, which is then fed to the next layer.

If anything, I hope you've become familiar with some of the variables and their usages in the overall ANN - especially weights and biases. These are critical variables which we will be working with in the next section of this article. I've skipped over some concepts just to keep this section shorter (Gradient Descent and Different Activation Functions [Sigmoid]) - however, you are free to do some of your own personal research to figure that part out.

Programming Phase

Now for the stuff you've all been waiting for, we're going to build, train, and test an actual neural network to detect focal points of a face using Python! If you don't know Python, that's ok as Python is a very easy language to understand with tons of utility functions. However, there are some complex data conversions that are required which will be explained in the video.

https://www.youtube.com/watch?v=GKAf-4GhXCI

I am using Google Collab, a great resource provided by Google, which allows everyone the ability to work in an online coding environment with access to GPUs and TPUs for optimizing training an AI model. Keep in mind that GPUs and TPUS are significantly faster than a typical CPU. You are able to access my Google Collab link by clicking HERE. Data-sets can also be accessed by downloading, extracting, and uploading the contents to your Google Drive. If you run into any issues - please feel free to shoot me a message over email or preferably on MS Teams!

Only major milestones have been included for the coding portion of this article, to see the full coding documentation please navigate over to the Google Collab link provided and watch the video demonstration for a better understanding!

After accessing we mount the correct drive using Google Collab, we need to import a large array of libraries.

# Import the necessary packages
import pandas as pd
import numpy as np
import os
import PIL
import seaborn as sns
import pickle
from PIL import *
import cv2
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.initializers import glorot_uniform
from tensorflow.keras.utils import plot_model
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint, LearningRateScheduler
from IPython.display import display
from tensorflow.keras import *
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, optimizers
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.layers import *
from tensorflow.keras import backend as K
from keras import optimizers
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from google.colab.patches import cv2_imshow

I have included a table below to view what each library does below:

Keras	Used to build and train our model
Tensorflow	Used to build and train our model
Pandas	Data-frame manipulation (Objects with Python)
Numpy	Library for numerical analysis
PIL	Helps us work with images and plotting
Seaborn	Used for data visualization

Following importation of the human-labeled data, we need to convert the image-data for each row from a flat single-dimension to a 2 dimensional image. This is because the data has been stored in CSV format, however, an image is two dimensional (X/Y axis). We can do so by executing the following code segment:

keyfacial_df['Image'] = keyfacial_df['Image'].apply(lambda x: np.fromstring(x, dtype = int, sep = ' ').reshape(96, 96))

This code segment uses a lambda function to split the contents of each item in the array using a delimiter (' ') and finally reshaping into a two dimensional 96x96 array. Notice that we are associating the image to the actual index of the data-frame - this helps us connect the image to the other data provided (labeled focal points). Next, we plot out the original image using the plotting library and finally draw on the focal points.

* Figure 7: Grid of faces plotted using Python and Google Collab, key focal points have been drawn with Python using provided data.

The next step is recommended for developing a successful model, we need to augment existing datasets to improve overall accuracy of our model. Since our images are pictures of faces, we need to consider images that have been flipped horizontally (mirrored) and in different brightness.

# Horizontal Flip - flip the images along y axis
keyfacial_df_copy['Image'] = keyfacial_df_copy['Image'].apply(lambda x: np.flip(x, axis = 1))

# since we are flipping horizontally, y coordinate values would be the same
# Only x coordiante values would change, all we have to do is to subtract our initial x-coordinate values from width of the image(96)
for i in range(len(columns)):
  if i%2 == 0:
    keyfacial_df_copy[columns[i]] = keyfacial_df_copy[columns[i]].apply(lambda x: 96. - float(x)

- The output of the above code is the following:

# Horizontally Flipped

# Original Image

We also randomly increase/decrease the brightness of each picture (as they are in grayscale) using a limitation of 0-255.

import random

keyfacial_df_copy = copy.copy(keyfacial_df)
keyfacial_df_copy['Image'] = keyfacial_df_copy['Image'].apply(lambda x:np.clip(random.uniform(1.5, 2)* x, 0.0, 255.0))
augmented_df = np.concatenate((augmented_df, keyfacial_df_copy))
augmented_df.shape

As we have standardized each image to be in gray scale, 0 being complete darkness (no light) and 255 being completely bright. We only only need to generate one random constant values and modify the value of each pixel by that constant.

We save the modified versions of augmented data to the same data-set to account of difference in brightness.

- To segment data, we do the following:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

- Generally speaking, we are aiming for the following ration of our overall data-set to look like the following:

We are now done with data augmentation, next we need to portion off segments of our data-set to account for testing and validation purposes. In short, we keep some data and slowly release it over each epoch so the model can compare and adapt to over-time. In-addition, we also restrict one portion of the data-set entirely for determining the overall accuracy of the model with data that the model has never seen before!

Creating our Neural Network (Intermediate Theory)

Now that we have completed data-augmentation and preparation, as well as segmenting data-sets. We are ready to start building the neurons and the different layers that belong in our ANN. This part of the article is the most difficult to wrap your head around. There is a large code-segment provided in the Google Collab which I will not be referencing here. However, I will use this opportunity to explore convolutional networks and how they work specifically with images:

https://www.youtube.com/watch?v=APZUfN3gBmE

- Performing abstractions on an image:

We take the image, then we apply convolutions. What we are trying to achieve is to extract features out of the image.
That's why call them image kernels.
Afterwards we apply a pooling filter, which takes in a feature map, and generalizes/pools them to reduce the computational requirements.
Finally we flatten those filters and feed them as inputs to the neural network.

I highly recommend that readers of this article navigate over to a fantastic resource provided by Carnegie Mellon University which provides a web-application demonstration on how convolutional networks function (CNN). Click HERE.

Building, Training, and Using our Model!

https://www.youtube.com/watch?v=oYg5wL10M9w

Compile the model

adam = tf.keras.optimizers.Adam(learning_rate = 0.0001, beta_1 = 0.9, beta_2 = 0.999, amsgrad = False)
model_1_facialKeyPoints.compile(loss = "mean_squared_error", optimizer = adam , metrics = ['accuracy'])

Train the model (using Google GPU)

history = model_1_facialKeyPoints.fit(X_train, y_train, batch_size = 64, epochs = 10, validation_split = 0.05, callbacks=[checkpointer])

Evaluate the model

result = model_1_facialKeyPoints.evaluate(X_test, y_test)
print("Accuracy: {}".format(result[1]))

Note: During training, we can see that after only a short amount of time we started over-fitting - which means that the AI stopped learning/improving. This usually means we need to adjust the batch-size and number of epochs. You can see this illustrated in the graph below:

After the last step, we can see that we have achieved the following log file:

41/41 [==============================] - 1s 15ms/step - loss: 7.3915 - accuracy: 0.8349 Accuracy: 0.8348909616470337

Wow! In such a short time of training, we have achieved 83% accuracy on our model. Surely with more data and augmentation, we will be able to achieve a higher accuracy level!

Playing with our Completed AI Model! (Last Step)

- Define our prediction function:

def predict(X_test):

  #predict using existing model
  df_predict = model_1_facialKeyPoints.predict(X_test)

  #convert to a data-frame
  df_predict = pd.DataFrame(df_predict, columns = columns)

  return df_predict

- Calling our prediction function and plotting the results:

# Now let's see for ourselves how accurate this model has gotten using Test-data (not used for training purposes)

fig = plt.figure(figsize = (20, 20))

for i in range(16):
  ax = fig.add_subplot(4, 4, i + 1)
  image = plt.imshow(X_test[i].squeeze(), cmap = 'gray')
  for j in range(1,31,2):
    plt.plot(df_predict.loc[i][j-1], df_predict.loc[i][j], 'rx')

Note: These are images that the model has never seen before and were not used during training. You will see that there are some errors for example face #3, and on higher contrasted images it is slightly inaccurate with detecting focal points. However, overall this turned out great!

The End.

That's all folks! If you have any questions please feel free to reach out to me over MS Teams or through my email (on the footer of this blog).

Ibrahim Element