Machine Learning III

B5L-C: Machine Learning III

Session Type: Lecture
Session Code: B5L-C
Location: Room 3
Date & Time: Thursday March 23, 2023 (15:20-16:20)
Chair: Najim Dehak
Track: 5
Paper IDPaper TitleAuthorsAbstract
3057Generative Versus Discriminative Data-Driven Graph Filtering of Random Graph SignalsLital Dabush, Nir Shlezinger, Tirza RouttenbergIn this paper we consider the problem of recovering random graph signals by graph signal processing (GSP) tools. We focus on partially-known linear settings, where one has access to data to cope with the missing domain knowledge in designing a graph filter for signal recovery. In this work, we formulate two main approaches for leveraging both the available domain knowledge and data for such graph filter design: 1) GSP- generative approach, where data is used to fit the underlying linear model, which determines the graph filter; and 2) GSP- discriminative approach, where data is used to directly learn the graph filter for graph signal recovery, bypassing the need to estimate the underlying model. Then, we compare qualitatively and quantitatively these two approaches of graph filter design. Our results provide an understanding as to which approach is preferable in which regime. In particular, it is shown that GSP- discriminative learning reliably copes with mismatches in the available domain knowledge, since it bypasses the need to fit the underlying model. On the other hand, the model awareness of the GSP-generative approach results in it achieving lower mean-squared error (MSE) when data is scarce. In the asymptotic region where the number of training data points approaches infinity, both approaches achieve the oracle minimum MSE estimator under the consider setting.
3027Particle Thompson Sampling with Static ParticlesZeyu Zhou{1}, Bruce Hajek{2}Particle Thompson sampling (PTS) is a simple and flexible approximation of Thompson sampling for solving stochastic bandit problems. PTS circumvents the intractability of maintaining a continuous posterior distribution in Thompson sampling by replacing the continuous distribution with a discrete distribution supported at a set of weighted static particles. We analyze the dynamics of particles\' weights in PTS for general stochastic bandits without assuming that the set of particles contains the unknown system parameter. It is shown that fit particles survive and unfit particles decay, with the fitness measured in KL-divergence. For Bernoulli bandit problems, all but a few fit particles decay.
3098Decentralized Differentially Private Without-Replacement Stochastic Gradient DescentRicheng Jin{3}, Xiaofan He{2}, Huaiyu Dai{1}While machine learning has achieved remarkable results in a wide variety of domains, the training of models often requires large datasets that may need to be collected from different individuals. As sensitive information may be contained in the individual’s dataset, sharing training data may lead to severe privacy concerns. Therefore, there is a compelling need to develop privacy-aware machine learning methods, for which one effective approach is to leverage the generic framework of differential privacy. Considering that stochastic gradient descent (SGD) is one of the most commonly adopted methods for large-scale machine learning problems, a decentralized differentially private SGD algorithm is proposed in this work. Particularly, we focus on SGD without replacement due to its favorable structure for practical implementation. Both privacy and convergence analysis are provided for the proposed algorithm. Finally, extensive experiments are performed to demonstrate the effectiveness of the proposed method.
3122Stochastic Mean-Shift for Speaker ClusteringItshak LapidotThis work is a continuation of our previous work on short segments speaker clustering. We have shown that mean-shift clustering algorithm with probabilistic linear discriminant analysis (PLDA) score as the similarity measure, can be a good approach for this task. While the standard mean-shift clustering algorithm is a deterministic algorithm, in this work we suggest a stochastic version to train the mean-shift. The quality of the clustering is measured by the value K, which is a geometric mean of average cluster purity (ACP) and average speaker purity (ASP). We test the proposed algorithm in the range of 3 to 60 speakers and show that it outperforms the deterministic mean-shift in all cases.