Callum Ke

Contents

Optimising for Fairness in Classification Models

Introduction

Machine learning systems increasingly influence high-stakes decisions in areas like hiring, lending, and university admissions. As these systems proliferate, ensuring they make fair decisions across different demographic groups has become a critical concern. However, the field faces a fundamental challenge: there is no universally agreed-upon definition of algorithmic fairness that captures all aspects of equality.

This article explores the findings from my research at the University of Bristol, supervised by Dr. Laurence Aitchison, on optimizing for fairness in classification models. I propose that stakeholders should prioritize choosing inherently "fair" prediction targets rather than attempting to impose fairness constraints on models trained with biased objectives.

Key Insight

Post-hoc fairness interventions have limited effectiveness when the underlying prediction target is biased. Instead, stakeholders should prioritize selecting prediction targets that are inherently fair—those with equal base rates across demographic groups.

The Fairness Dilemma

A central challenge in fairness research is the absence of a formal definition that captures all dimensions of equality. This is complicated by the Impossibility Theorem, which demonstrates that commonly used statistical fairness definitions are inherently contradictory. Since each definition intuitively captures desirable aspects of equitable decision-making, practitioners must make difficult trade-offs.

Visualization 1: Relationships Between Fairness Criteria

The red dashed lines indicate conflicting criteria that generally cannot be satisfied simultaneously. Note that Independence (Demographic Parity) and Sufficiency (Calibration) have a particularly strong conflict unless the base rates between groups are equal.

Three Major Fairness Definitions

Let's denote our sensitive attribute as S ∈ {a, b}, the true outcome labels as Y ∈ {0, 1}, and the predicted outcomes as Ŷ ∈ {0, 1}.

1. Independence (Demographic Parity)

Ensures equal selection rates across groups: Ŷ ⊥ S

P(Ŷ = 1|S = a) = P(Ŷ = 1|S = b) = P(Ŷ = 1)

2. Separation (Equalized Odds)

Ensures equal error rates across groups: Ŷ ⊥ S|Y

P(Ŷ = 1|Y = 1, S = a) = P(Ŷ = 1|Y = 1, S = b)

P(Ŷ = 1|Y = 0, S = a) = P(Ŷ = 1|Y = 0, S = b)

3. Sufficiency (Calibration by Group)

Ensures equal precision rates across groups: Y ⊥ S|Ŷ

P(Y = 1|S = a, Ŷ = y) = P(Y = 1|S = b, Ŷ = y)

The Impossibility Theorem

The Impossibility Theorem states that these definitions are mutually exclusive except in three scenarios:

  • We have a perfectly accurate predictor
  • Our predictor trivially assigns all predictions to a single value (0 or 1)
  • The target variable has equal base rates between sensitive groups

The third scenario provides the only reasonable option in most real-world applications where both fairness and utility matter.

A Novel Approach: The "Fair Goal"

Rather than applying fairness constraints to models trained on potentially biased targets, I propose shifting focus to the selection of inherently fair prediction objectives.

Defining a "Fair Goal"

In this research, a "fair goal" is defined as a prediction target where the underlying distribution has equal base rates of the positive outcome across all demographic groups. While identifying such goals isn't always straightforward, they may reasonably exist in contexts where fairness is an inherent objective.

Visualization 2: Base Rate Distributions Across Different Goals

The Value Added measure shows equal base rates between groups, making it a "fair goal" by our definition.

University Admissions: A Case Study

To demonstrate this approach, I conducted a simulated experiment of university admissions, comparing classifiers trained to predict acceptance based on three different targets:

  1. A-level scores (traditional academic achievement)
  2. Graduation scores (final university performance)
  3. Value-added measure (improvement from entry to graduation)

The experiment evaluated these targets against standard fairness metrics, with particular focus on the "value-added" measure as a potentially fair prediction target.

Results

The results clearly demonstrate that the "fair goal" (value-added target) simultaneously achieved lower costs across all fairness metrics compared to the other "unfair" targets. This was consistent across different training fairness constraints.

Visualization 3: Fairness Costs Comparison

Total fairness costs across different fairness constraints, comparing the three prediction targets.

Key Findings

  1. Classifiers trained on the value-added target showed significantly lower total fairness costs across all fairness constraints.
  2. Traditional metrics like A-level scores and final graduation scores performed poorly on fairness measures even when fairness constraints were applied during training.
  3. The results confirm that choosing a fair prediction target is more effective than post-hoc fairness interventions on models trained with biased objectives.

Research Implications

Our findings challenge the common approach of applying fairness constraints to models trained on potentially biased targets. Instead, we demonstrate that selecting inherently fair prediction objectives can lead to models that naturally satisfy multiple fairness criteria without sacrificing utility.

Practical Implications

These findings have significant implications for stakeholders in high-stakes decision-making contexts:

  1. Prioritize target selection: Resources should first be directed toward identifying and validating inherently fair prediction targets.
  2. Rethink fairness interventions: Pre-processing, in-processing, or post-processing fairness interventions have limited effectiveness when the underlying prediction target is biased.
  3. Measure what matters: Organizations should critically examine what they're optimizing for and consider alternative metrics that better align with fairness objectives.

Future Directions

While this research focuses on statistical group fairness, other fairness frameworks exist, including:

  • Causal fairness: Evaluating discrimination within causal frameworks
  • Counterfactual fairness: Ensuring predictions remain consistent in counterfactual scenarios
  • Individual fairness: Guaranteeing similar individuals receive similar outcomes

Future research could explore how choosing fair goals affects these alternative fairness definitions. Additionally, developing methods to identify or construct fair prediction targets in various domains remains an important research direction.

Conclusion

The pursuit of fairness in machine learning often focuses on algorithmic interventions, but my research suggests a more fundamental approach: selecting inherently fair prediction targets. By choosing goals with equal base rates across demographic groups, we can develop models that satisfy multiple fairness criteria simultaneously without sacrificing utility.

As machine learning systems continue to influence high-stakes decisions, this perspective offers a promising path forward for building more equitable AI systems.

References

  1. Aitchison, L., & Ke, C. (2025). Fairness in Machine Learning: The Importance of Choosing the Right Goal. University of Bristol, Department of Computer Science.
  2. Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and Machine Learning. fairmlbook.org.
  3. Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153-163.
  4. Corbett-Davies, S., & Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023.
  5. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, 214-226.
  6. Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29, 3315-3323.
  7. Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807.
  8. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1-35.
  9. Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., & Weinberger, K. Q. (2017). On fairness and calibration. Advances in Neural Information Processing Systems, 30.
  10. Verma, S., & Rubin, J. (2018). Fairness definitions explained. IEEE/ACM International Workshop on Software Fairness, 1-7.