Machine learning systems increasingly influence high-stakes decisions in areas like hiring, lending, and university admissions. As these systems proliferate, ensuring they make fair decisions across different demographic groups has become a critical concern. However, the field faces a fundamental challenge: there is no universally agreed-upon definition of algorithmic fairness that captures all aspects of equality.
This article explores the findings from my research at the University of Bristol, supervised by Dr. Laurence Aitchison, on optimizing for fairness in classification models. I propose that stakeholders should prioritize choosing inherently "fair" prediction targets rather than attempting to impose fairness constraints on models trained with biased objectives.
Post-hoc fairness interventions have limited effectiveness when the underlying prediction target is biased. Instead, stakeholders should prioritize selecting prediction targets that are inherently fair—those with equal base rates across demographic groups.
A central challenge in fairness research is the absence of a formal definition that captures all dimensions of equality. This is complicated by the Impossibility Theorem, which demonstrates that commonly used statistical fairness definitions are inherently contradictory. Since each definition intuitively captures desirable aspects of equitable decision-making, practitioners must make difficult trade-offs.
The red dashed lines indicate conflicting criteria that generally cannot be satisfied simultaneously. Note that Independence (Demographic Parity) and Sufficiency (Calibration) have a particularly strong conflict unless the base rates between groups are equal.
Let's denote our sensitive attribute as S ∈ {a, b}, the true outcome labels as Y ∈ {0, 1}, and the predicted outcomes as Ŷ ∈ {0, 1}.
Ensures equal selection rates across groups: Ŷ ⊥ S
P(Ŷ = 1|S = a) = P(Ŷ = 1|S = b) = P(Ŷ = 1)
Ensures equal error rates across groups: Ŷ ⊥ S|Y
P(Ŷ = 1|Y = 1, S = a) = P(Ŷ = 1|Y = 1, S = b)
P(Ŷ = 1|Y = 0, S = a) = P(Ŷ = 1|Y = 0, S = b)
Ensures equal precision rates across groups: Y ⊥ S|Ŷ
P(Y = 1|S = a, Ŷ = y) = P(Y = 1|S = b, Ŷ = y)
The Impossibility Theorem states that these definitions are mutually exclusive except in three scenarios:
The third scenario provides the only reasonable option in most real-world applications where both fairness and utility matter.
Rather than applying fairness constraints to models trained on potentially biased targets, I propose shifting focus to the selection of inherently fair prediction objectives.
In this research, a "fair goal" is defined as a prediction target where the underlying distribution has equal base rates of the positive outcome across all demographic groups. While identifying such goals isn't always straightforward, they may reasonably exist in contexts where fairness is an inherent objective.
The Value Added measure shows equal base rates between groups, making it a "fair goal" by our definition.
To demonstrate this approach, I conducted a simulated experiment of university admissions, comparing classifiers trained to predict acceptance based on three different targets:
The experiment evaluated these targets against standard fairness metrics, with particular focus on the "value-added" measure as a potentially fair prediction target.
The results clearly demonstrate that the "fair goal" (value-added target) simultaneously achieved lower costs across all fairness metrics compared to the other "unfair" targets. This was consistent across different training fairness constraints.
Total fairness costs across different fairness constraints, comparing the three prediction targets.
Our findings challenge the common approach of applying fairness constraints to models trained on potentially biased targets. Instead, we demonstrate that selecting inherently fair prediction objectives can lead to models that naturally satisfy multiple fairness criteria without sacrificing utility.
These findings have significant implications for stakeholders in high-stakes decision-making contexts:
While this research focuses on statistical group fairness, other fairness frameworks exist, including:
Future research could explore how choosing fair goals affects these alternative fairness definitions. Additionally, developing methods to identify or construct fair prediction targets in various domains remains an important research direction.
The pursuit of fairness in machine learning often focuses on algorithmic interventions, but my research suggests a more fundamental approach: selecting inherently fair prediction targets. By choosing goals with equal base rates across demographic groups, we can develop models that satisfy multiple fairness criteria simultaneously without sacrificing utility.
As machine learning systems continue to influence high-stakes decisions, this perspective offers a promising path forward for building more equitable AI systems.