9

Video Facial Expression Detection with Deep Learning (Applying Fast.ai

 4 years ago
source link: https://mc.ai/video-facial-expression-detection-with-deep-learning-applying-fast-ai/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Video Facial Expression and Awareness Detection with Fast.ai and OpenCV

Take your trained image classification model, make it work on live videos or video files, and add facial landmarking!

The inspiration behind this? The FBI agent watching me through my webcam, but replaced by deep learning

Introduction

The goal of this tutorial? Train a facial expression classification model with the fast.ai library, read facial expressions from your webcam or a video file, and finally, add in facial landmarking to track your eyes to determine awareness! ( TL;DR the fully working code is here https://github.com/jy6zheng/FacialExpressionRecognition )

The main reason why I wrote this tutorial is that when working on this project, a big challenge was figuring out how to take my trained classifier and make it work on both live videos and video files efficiently. The additional eye landmarking feature was based off this tutorial that I found extremely useful: https://www.pyimagesearch.com/2017/05/08/drowsiness-detection-opencv/

Training

The first step is to train an image classification model with a convolutional neural network. The data I used was from https://www.kaggle.com/jonathanoheix/face-expression-recognition-dataset

I used the fast.ai library, built on top of PyTorch to train my classification model. The model was trained with resnet34 pre-trained weights and the training dataset and exported as a .pkl file. For step-by-step instructions, check out the google colab notebook in my repository, which contains all the code to train your model: https://github.com/jy6zheng/FacialExpressionRecognition

The greatest challenge was first finding a public dataset and then cleaning the data. Initially, when I used the Kaggle dataset, I was only able to train up to an error rate of 0.328191, which meant it was only correct around 68% of the time (not that great at all). When I plotted images that produced the top losses, I quickly realized that a large amount of the data was incorrectly labeled (the left is the predicted expression by the model, the right is the labeled emotion).

The girl on the bottom row third left clearly does not look happy

After cleaning the data, the error rate decreased by over 16%. Now the classifier has around 84% accuracy meaning it correctly identified 84% of the face images. There are still some incorrect and dirty data, and therefore room for more improvement.

As you can see, neutral and sad faces understandably get confused the most

Using Trained Model on Live Videos

Now, it is time to take our classifier and use it on live video streams. First, it is preferable to create a virtual environment so that this project has its own dependencies and does not interfere with any other project. Then, download the required packages and libraries. Create a file called liveVideoFrame.py (or whatever you want to name it) and import the following:

from scipy.spatial 
import distance as dist
import numpy as np
import cv2
from imutils import face_utils
from imutils.video import VideoStream
from fastai.vision import *
import imutils
import argparse
import time
import dlib

I wanted the option to save the predictions in a .csv file and save the marked-up video, so I added argument parsing. I also exported the trained classification model and moved it to my working directory.

ap = argparse.ArgumentParser()
ap.add_argument("--save", dest="save", action = "store_true")
ap.add_argument("--no-save", dest="save", action = "store_false")
ap.set_defaults(save = False)
ap.add_argument("--savedata", dest="savedata", action = "store_true")
ap.add_argument("--no-savedata", dest="savedata", action = "store_false")
ap.set_defaults(savedata = False)
args = vars(ap.parse_args())path = '/Users/joycezheng/FacialRecognitionVideo/' #change this depending on the path of your exported model
learn = load_learner(path, 'export.pkl')

Great! Now it is time to start our video stream. I used VideoStream from imutils.video since I found it works faster than cv2.VideoCapture. Note: the source for the video stream is 0 for the built-in webcam, it will be different if you are using a different camera such as a plugin.

A haar cascade classifier is used to identify frontal faces in the video frame. We have an array named data to store our predictions. The timer and time_value are used to label the time of each prediction in our data so that the predictions increment by 1s in the.csv file.

face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml") 
vs = VideoStream(src=0).start()
start = time.perf_counter() 
data = []
time_value = 0
if args["save"]: 
 out = cv2.VideoWriter(path + "liveoutput.avi", cv2.VideoWriter_fourcc('M','J','P','G'), 10, (450,253))

Now, we will implement a while loop that reads each frame from the video stream:

  1. Each Frame is converted to grayscale since the image classifier was trained on grayscale images
  2. The cascade classifier is used to find faces in the frame. I set the minneighbors parameter to 5 since I found it worked best on live videos. For recorded video files, I set it to a higher value since a face is guaranteed to be in each frame
  3. The grayscale image is then cropped for the faces with a buffer of 0.3 since our classifier was trained on close-up faces without much background
  4. Text and bounding boxes are then drawn onto each frame and shown
  5. Each frame is then saved to the video writer using out.write(frame)
while True: 
 frame = vs.read() 
 frame = imutils.resize(frame, width=450) 
 gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) 
 face_coord = face_cascade.detectMultiScale(gray, 1.1, 5, minSize=(30, 30))
 for coords in face_coord: 
 X, Y, w, h = coords 
 H, W, _ = frame.shape 
 X_1, X_2 = (max(0, X - int(w * 0.3)), min(X + int(1.3 * w), W)) 
 Y_1, Y_2 = (max(0, Y - int(0.3 * h)), min(Y + int(1.3 * h), H)) 
 img_cp = gray[Y_1:Y_2, X_1:X_2].copy() 
 prediction, idx, probability = learn.predict(Image(pil2tensor(img_cp, np.float32).div_(225)))
 cv2.rectangle( 
 img=frame, 
 pt1=(X_1, Y_1), 
 pt2=(X_2, Y_2), 
 color=(128, 128, 0), 
 thickness=2, 
 ) 
 cv2.putText(frame, str(prediction), (10, frame.shape[0] - 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (225, 255, 255), 2) cv2.imshow("frame", frame) if args["save"]:
 out.write(frame)
 if cv2.waitKey(1) & 0xFF == ord("q"):
 breakvs.stop()
if args["save"]:
 print("done saving video") 
 out.release()
cv2.destroyAllWindows()

Now we have our fast.ai learning model working with imutils and OpenCV to predict faces from live videos!

Next, it is time to determine the awareness of the face. The function eye_aspect_ratio calculates the eye aspect ratio from the coordinates of the eye. The position and coordinates of each eye are found from the dlib pre-trained facial landmark detector. The function data_time is used to append predictions in the data array every 1-second interval.

EYE_AR_THRESH = 0.20
EYE_AR_CONSEC_FRAMES = 10COUNTER = 0def eye_aspect_ratio(eye):
 A = dist.euclidean(eye[1], eye[5])
 B = dist.euclidean(eye[2], eye[4])
 C = dist.euclidean(eye[0], eye[3])
 ear = (A + B) / (2.0 * C)
 return eardef data_time(time_value, prediction, probability, ear):
 current_time = int(time.perf_counter()-start)
 if current_time != time_value:
 data.append([current_time, prediction, probability, ear])
 time_value = current_time
 return time_valuepredictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]

Within the for loop which loops over the face coordinates, add the following block of code. The eye is detected using the dlib face landmark detector and drawn onto the frame. When the average calculated eye aspect ratio between the two eyes is less than the threshold for more than ten consecutive frames (which you can modify to your own liking), then the face is marked as distracted.

 rect = dlib.rectangle(X, Y, X+w, Y+h)
 shape = predictor(gray, rect)
 shape = face_utils.shape_to_np(shape)
 leftEye = shape[lStart:lEnd]
 rightEye = shape[rStart:rEnd]
 leftEAR = eye_aspect_ratio(leftEye)
 rightEAR = eye_aspect_ratio(rightEye)
 ear = (leftEAR + rightEAR) / 2.0
 leftEyeHull = cv2.convexHull(leftEye)
 rightEyeHull = cv2.convexHull(rightEye)
 cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1)
 cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)
 if ear < EYE_AR_THRESH:
 COUNTER += 1
 if COUNTER >= EYE_AR_CONSEC_FRAMES:
 cv2.putText(frame, "Distracted", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
 else:
 COUNTER = 0
 cv2.putText(frame, "Eye Ratio: {:.2f}".format(ear), (250, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
 time_value = data_time(time_value, prediction, probability, ear)

Finally, at the bottom of our code, we can save the data as a data frame and then a .csv file.

if args["savedata"]:
 df = pd.DataFrame(data, columns = ['Time (seconds)', 'Expression', 'Probability', 'EAR'])
 df.to_csv(path+'/exportlive.csv')
 print("data saved to exportlive.csv")

You can test the code in the command-line by running:

python liveVideoFrameRead.py --save --savedata

The full code is here:

Using Trained Model on Video Files

I used a very similar approach to video files compared to live videos. The main difference is that predictions occur every number of frames, which can be modified using the command-line argument — frame-step. The full code is below:

And that’s it! You now are able to predict facial expressions from both video files and live webcams.

Thank you for reading this :), and let me know if there are any improvements or questions. The fully working code is here: https://github.com/jy6zheng/FacialExpressionRecognition

Live Video Classifier Demo

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK