Deep neural networks being Racist at facial detection

Aakash Gupta
5 min readAug 8, 2020


  1. Introduction
  2. Experiment with MTCNN
  3. Why some Neural Networks give biased results and some proposed solutions
  4. Conclusion
  5. Credits and links for further reading


While taking the Fast-AI Practical Deep Learning for Coders course, Jeremy Howard (instructor for the course) was mentioning about various alumni of the course and among those people was Melissa Fabros. Melissa is working with Kiva which is a micro-lending organization in Africa, she helped the organization to build a facial recognition system for their use because the existing facial recognition systems were 30–40 times worse at recognizing black women than white men. According to Jeremy, “In fact, I think it was IBM system was like 99.8% accurate on common white face men versus 65% accurate on dark skinned women. So it’s like 30 or 40 times worse for black women versus white men. This is really important because for Kiva, black women perhaps are the most common user base for their micro lending platform.” Links to few of such studies could be found at the end of the article. I was curious after knowing this and wanted to find out more about it to know the reason behind such bias by computers.

Caution message on the download page of VGG Face Data set

Experiment with MTCNN

I downloaded MTCNN (Multi-task Cascaded Convolutional Networks) which was a pre-trained model that could perform facial detection. The MTCNN uses 3 networks. Firstly, the P-net(Proposal Network) which is a shallow network that proposes many windows which may contain faces.

From the MTCNN paper. NMS stands for Non-Maximum Suppression

The second network R-net(Refinement Network) refines the output of P-net and rejects many of the non-face windows. Lastly, the O-net(Output network) refines the output of R-net and also give the facial landmark positions.

The reason for selecting MTCNN for the experiment were, that firstly, it was trained using WIDER-FACE and CelebA data set which are exhaustive datasets much bigger than VGG Face data set and secondly, that it is very fast in computation even on a CPU.

I downloaded the PyTorch implementation of the same from GitHub and fed many videos through the same. The videos contained wide variety of ethnic groups spread throughout the world.

Above is one of the video (out of the many) I tracked while conducting the experiment.

Where MTCNN missed some faces

Conclusion of the experiment being that for practical purposes the MTCNN facial detection worked well and no noticeable gap was seen during the experiment which although was not the most exhaustive but was enough to depict the practical usage of most users.

Why some Neural Networks give biased results and some proposed solutions

Even though MTCNN gave good results on our test data, still we cannot deny the fact that some networks give biased results and same has been shown in many studies.

Comparison of some popular data sets

As it is clear from the comparison, datasets are generally not balanced and as one could expect, the results from the networks trained on such data show the contrast clearly when used. The question that arises is “Why such disparity is observed in the Datasets.” It is mostly because of the way the data is collected from various online sources. For example, let us take PubFig: Public Figures Face Database which contains 58,797 images of 200 people which are taken from the internet randomly like most of the data sets.Though internet access is rapidly growing but still it is not available to all thus collecting data randomly of the internet according me is one of the reason that creates the 30% to 60% gap in the data set.

But not all data sets are gathered randomly from the web, for example AgeDB is a manually collected data set but still we can see the diversity disparity. The best possible explanation according to me for this is Ignorance. During the compilation of such large data sets one could be ignorant towards this issue to fulfill larger goals in a shorter period of time.

The solution to these issue seems to be, awareness about the issue and obviously, the ability to create a balanced data set. DiF is a data set created by IBM and is short for Diversity in Faces. The researchers were well aware of the disparity in the existing data sets and thus they created this new data set using the YFCC-100M images. They annotated 1M facial images using 10 facial coding schemes in their pursue to increase the diversity coverage in the field of AI.

Other solution could be fulfilling that gap in diversity using Generative AI. We can generate images using Generative AI to create variety and increase diversity in our data sets but this can only be achieved when we have some diversity metric for our data set and that is where IBM’s annotation literature could come in handy and enable us to narrow down the gap.


During this discussion we were able to establish the fact that main reasons behind the racist behavior of our neural networks is the imbalanced nature of the data sets they are trained upon. The imbalance in the data set is mainly due to ignorance and the lack of internet access to some of the underprivileged sections of the society. But through our experiment, we established that still some networks like the MTCNN perform fine under practical scenarios. The high accuracy of MTCNN is largely due to the fact that it was trained on a very large data set. Also, awareness among researchers lead to many studies like the Diversity in Face by IBM which will accelerate the diversity coverage of AI and enable developers to use tools like Generative AI to fulfill the gap.


[1] IBM Diversity in Faces

[2] Fast-AI course

[3] Facenet-pytorch repository

[4] MTCNN Paper

[5] AgeDB Paper

Links to some related reports in the media

[1] NYTimes


[3] TheVerge

A compiled YouTube video of all the videos that I fed during the experiment could be found here.