<aside> 🔗 Jupyer Notebook Link

</aside>

In this project we explore several neural network architectures to try and predict the location of facial keypoints in human faces.

p.s. each train/validation split was created using an 80/20 split

Part 1: Nose Tip Detection

First we start with the simple problem of just predicting the location of someone’s nose. For this part we use the IMM Face Dataset which contains 240 facial images of 40 people. The images of the first 32 people will be used as our training set while the remaining pictures of the 8 people will be used as our validation set.

Preprocessed IMM Face Dataset Images

Due to our small training set, we preprocess the IMM Face Dataset images by converting them to grayscale, normalizing them, and shrinking them to 80x60. The green dot shows the ground-truth label that we are trying to predict.

Neural Network, Loss, & Optimizer

For this first part our neural network is quite simple. I ended up using three convolution layers paired with ReLU and maxpool layers and two fully connected layers at the end.

For the loss we use nn.MSELoss() since we are dealing with a regression problem.

For the optimizer we used Adam with a learning rate of 1e-3.

During training I used a batch size of 16.

Train and Validation Losses

After training for 25 epochs, we can visualize the train and validation losses

We can also visualize how our network performs under new samples,

As we can see it tends to do get pretty close in images where the person is looking straight ahead e.g. samples #34 and 36. Unfortunately, in when the person is looking to the side, the networks struggles predicting the nose tip, e.g. samples #15 and #5.