Distributed & Robust Machine Learning I – Invited Special Session

Session Type: Lecture
Session Code: A1L-E
Location: Room 5
Date & Time: Wednesday March 22, 2023 (09:00 - 10:00)
Chair: Cong Shen
Track: 12

Paper IDPaper NameAuthorsAbstract
3206Federated Survival Analysis with Competing Events Md Mahmudur Rahman, Sanjay PurushothamFederated learning enables multiple medical institutions to jointly train and validate state-of-the-art machine learning models to improve patient outcomes without sharing sensitive patient data. In this work, we investigate Federated Survival Analysis (FSA), i.e., using the Federated learning framework for patient survival predictions. FSA is very challenging due to two critical survival data properties, namely, censoring and competing risks. Censoring of event status (death or survival) commonly occurs due to loss-to-follow-up or end of the study and is known to lead to inaccurate survival risk predictions. On the other hand, the presence of competing events, i.e., where one of the events influences the risk of the occurrence of another event (competing risks), can lead to biased risk predictions. To address these important issues, (a) we propose a simple algorithm to estimate consistent federated pseudo values to handle censoring and competing events (even for non-uniformly distributed data), and (b) we introduce a flexible pseudo-value-based deep learning framework named TransFedCRA, where we employ Transformer-based regression models for subjects-specific prediction of the marginal risk of an event. Experiments on large-scale real-world SEER data demonstrate that our TransFedCRA framework achieves better performance than other federated learning approaches with competing events.
3170Model Segmentation for Storage Efficient Private Federated Learning with Top R Sparsification Sajani Vithana, Sennur UlukusIn federated learning (FL) with top r sparsification, millions of users collectively train a machine learning (ML) model locally, using their personal data by only communicating the most significant r fraction of updates to reduce the communication cost. It has been shown that the values as well as the indices of these selected (sparse) updates leak information about the users' personal data. In this work, we investigate different methods to carry out user-database communications in FL with top r sparsification efficiently, while guaranteeing information theoretic privacy of users' personal data. These methods incur considerable storage cost. As a solution, we present two schemes with different properties that use MDS coded storage along with a model segmentation mechanism to reduce the storage cost at the expense of a controllable amount of information leakage, to perform private FL with top r sparsification.
3224Addressing Uncertainty and Ambiguity in Image Recognition: a Case Study on COVID-19 Data Ping Xu, Athena Tian, Yue WangAmbiguity and uncertainty are inherently present in many machine learning tasks, especially in medical diagnosis where the within-class variation and between-class similarity are high. To address these issues, we introduce the concept of ambiguous distribution (AD) detection, which is a preprocessing step prior to image classification with the aim of separating highly ambiguous sample classes from other classes. For AD detection, a deep neural network model with supervised contrastive loss is proposed, which is designed to push data of different classes to be well separated in an embedding space. Because a medical image database is often unbalanced and has a limited number of samples, we partially apply contrastive loss on tail-class in-distribution and AD samples instead of on the entire in-distribution samples. Taking into consideration of both ambiguous and unbalanced class samples, our deep learning model is trained with an objective loss that combines in-distribution cross entropy loss, KL divergence, and partial contrastive loss. It is shown to outperform conventional supervised learning and conventional contrastive learning in terms of the accuracy of both AD detection and image classification on the COVID-19 data.