Post-Hoc Methods for Debiasing Neural Networks

In this post, we give an introduction to bias in machine learning, and we discuss our new research for debiasing pretrained neural networks (i.e., post-hoc debiasing).
ArXiv paper:
Source code:

The last decade has seen a huge rise in machine learning applications. Many of these algorithms are now deployed in high-stakes scenarios such as loan repayment, fraud detection, hiring decisions, and criminal recidivism prediction [1, 2, 3, 4]. There are clear advantages to using machine learning algorithms. For example, thousands of datapoints with many features can be processed in the blink of an eye. However, these algorithms are susceptible to bias towards individuals or groups of people, from a variety of sources [5, 6]. For instance, Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) is a computer software which determines the risk of a defendant committing a future crime. United States judges consult the software to decide whether or not a defendant should be granted bail or pretrial release. It was found that this software is biased against African-Americans [7].

The models often end up biased because the training datasets themselves are biased. This bias can come from multiple sources, including:
(1) Sampling biases: the data could be imbalanced across different features or groups of people (e.g., more policing in predominantly African-American neighborhoods).
(2) Historical biases: certain classes of people were biased against historically, and this can be reflected in the dataset (e.g., African-Americans receiving harsher sentences for the same crimes).

Using technology for life-changing events which make prejudiced decisions may deepen the divides that already exist in societies. Therefore, over the last several years, the machine learning community has put in a substantial effort to create fair machine learning algorithms. Dozens of formal definitions of fairness have been proposed [4]. For example, the equal opportunity definition states that the true positive rates in the protected and unprotected classes must be the same. Here, the True Positive Rate (TPR) of a classifier is the percent of datapoints with a positive label, which were also predicted to have a positive label.

Screen Shot 2020-06-16 at 1.58.04 AM

There are also dozens of proposed algorithms for fairness. The majority of fair algorithms are in-processing algorithms, which take as input a training dataset and then train a new, fairer model from scratch. However, this is not always practical. For example, recent neural networks such as XLNet or GPT-3 can take weeks to train and are very expensive. Additionally, for some applications, the full training set may no longer be available due to regulatory or privacy requirements.

In contrast, post-hoc methods take as input a pretrained model and a smaller validation dataset, and then debias the model through fine-tuning or post-processing. This type of algorithm has not received as much attention in the research community.

Three New Post-Hoc Debiasing Methods

In this work, we present three new post-hoc debiaising algorithms. Our first technique is a simple algorithm, random perturbation, which iteratively adds multiplicative noise to the weights of the original neural network, outputting the perturbation which minimizes bias and maximizes accuracy. Basically, the algorithm makes small changes to the neural network and then checks to see if the bias is decreased.

Our second technique is a layer-wise optimization algorithm. In this approach, we iteratively choose a layer of the neural network and use Gradient-Boosted Regression Trees (GBRT) to optimize the weights of the chosen layer with respect to bias and accuracy. This is a more powerful optimization technique than random perturbation, but it is more computationally intensive, so it can only run on individual layers instead of the entire neural network.

Our last technique is an adversarial fine-tuning algorithm. Adversarial training is one of the most interesting and powerful machine learning techniques, famously used in GANs (see our earlier blog post for an overview). Our algorithm trains a new neural network, a discriminator, to predict the bias of the original neural network. This effectively gives us a proxy for the bias loss which is differentiable, enabling the use of first-order optimization techniques such as gradient descent to fine-tune the original neural network. Adversarial learning has recently been proposed as an in-processing method for debiasing, but it has never been used as a post-hoc debiasing method until our work. Adversarial fine-tuning is the most powerful of our three post-hoc algorithms, yet it is often the most computationally intensive.



We compare the three above techniques with three popular post-processing algorithms from prior work. We run experiments with three fairness definitions and three popular fairness datasets from the AIF360 toolkit: COMPAS, Bank Marketing, and Adult Census Income. We also run experiments for a few different initial neural networks of various lengths and widths. Our experiments show a few trends: (1) our post-hoc methods outperform all three post-processing methods on average; (2) the random perturbation algorithm is a strong baseline, while the adversarial fine-tuning algorithm performs well on larger neural networks.


pareto_plot_BM (age)_spd



[1] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness in machine learning.NIPSTutorial, 2017.

[2] Miranda Bogen and Aaron Rieke. Help wanted: An examination of hiring algorithms, equity,and bias, 2018.

[3] Amitabha Mukerjee, Rita Biswas, Kalyanmoy Deb, and Amrit P Mathur. Multi–objectiveevolutionary algorithms for the risk–return trade–off in bank loan management.InternationalTransactions in operational research, 2002.

[4] Eric WT Ngai, Yong Hu, Yiu Hing Wong, Yijun Chen, and Xin Sun. The application of datamining techniques in financial fraud detection: A classification framework and an academicreview of literature. Decision support systems, 50(3):559–569, 2011.

[5] Executive Office of the President, Cecilia Munoz, Domestic Policy Council Director, Megan(US Chief Technology Officer Smith (Office of Science, Technology Policy)), DJ (Deputy ChiefTechnology Officer for Data Policy, Chief Data Scientist Patil (Office of Science, and TechnologyPolicy)).Big data: A report on algorithmic systems, opportunity, and civil rights. ExecutiveOffice of the President, 2016.

[6] Cathy O’neil.Weapons of math destruction: How big data increases inequality and threatensdemocracy. Broadway Books, 2016.

[7] Anthony W Flores, Kristin Bechtel, and Christopher T Lowenkamp. False positives, falsenegatives, and false analyses: A rejoinder to machine bias: There’s software used across thecountry to predict future criminals. and it’s biased against blacks.Fed. Probation, 2016.

Related posts

AI Agents - Build and Host LLM Apps At Scale


Data LLM: Get insights from your data


Giraffe - Long Context LLMs


Create a CustomGPT And Supercharge your Company with AI  -  Pick the Best LLM

Leave a Reply

%d bloggers like this: