Information-theoretic Methods for Machine Learning II – Invited Special Session
A5L-D: Information-theoretic Methods for Machine Learning II - Invited Special SessionSession Type: Lecture
Session Code: A5L-D
Location: Room 4
Date & Time: Wednesday March 22, 2023 (15:20-16:20)
Chair: Flavio Calmon
|Rashomon Effect in Machine Learning
|It is almost always easier to find an accurate-but-complex model than an accurate-yet-simple model. Finding optimal, sparse, accurate models of various forms is generally NP-hard. We often do not know whether the search for a simpler model will be worthwhile, and thus we do not go to the trouble of searching for one. In this talk, we ask an important practical question: can accurate-yet-simple models be shown likely to exist before explicitly searching for them? We show that if the Rashomon set is large (where the Rashomon set is the set of almost-equally-accurate models from a function class), it contains numerous accurate models, and perhaps at least one of them is the simple model we desire. We formally present new metrics on characterizing the Rashomon set and discuss that many real-world data sets admit large Rashomon sets.
|Partial Information Decomposition in Algorithmic Fairness
|In algorithmic fairness, when it comes to resolving legal disputes or informing policies, one needs to dig deeper and understand how the disparity arose. For instance, disparities in hiring that can be explained by an occupational necessity (code-writing for software engineering) may be exempt by law, but the disparity arising due to an aptitude test may not be (Griggs v. Duke Power). In this talk, I will discuss a question that bridges the fields of fairness, explainability, and law: how do we check if the disparity in a model is purely due to critical occupational necessities or not? We propose a systematic measure of non-exempt disparity, that brings together causality and information theory, in particular an emerging body of work in information theory called Partial Information Decomposition (PID). PID allows one to quantify the information that several random variables provide about another random variable, either individually (unique information), redundantly (shared information), or only jointly (synergistic information). To arrive at our measure of non-exempt disparity, we first examine several canonical examples that lead to a set of desirable properties (axioms) that a measure of non-exempt disparity should satisfy and then propose a measure that satisfies those properties. Time permitting, I will also discuss our recent results on using PID to quantify tradeoffs between local and global fairness in federated learning.
|Model Projection: from Theory to Practice
|Wael Alghamdi, Flavio Calmon
|We consider the problem of producing fair probabilistic classifiers for multi-class classification tasks. We formulate this problem in terms of “projecting” a pre-trained (and potentially unfair) classifier onto the set of models that satisfy target group fairness requirements. The new, projected model is given by post-processing the outputs of the pre-trained classifier by a multiplicative factor. We provide a parallelizable iterative algorithm for computing the projected classifier and derive both sample complexity and convergence guarantees. Comprehensive numerical comparisons with state-of-the-art benchmarks demonstrate that our approach maintains competitive performance in terms of accuracy-fairness trade-off curves while achieving favorable runtime on large datasets. We also evaluate our method at scale on an open dataset with multiple classes, multiple intersectional protected groups, and over 1M samples.