Theory of Equivariant Machine Learning II – Invited Special Session
Session Type: Lecture
Session Code: A2L-D
Location: Room 4
Date & Time: Wednesday March 22, 2023 (10:20 - 11:20)
Chair: David Hogg, Soledad Villar
Track: 12
Paper ID | Paper Name | Authors | Abstract |
---|---|---|---|
3025 | A Fourier View of Equivariant Neural Networks on Homogeneous Spaces | Yinshuang Xu, Jiahui Lei, Kostas Daniilidis | We present a unified framework for equivariant networks on homogeneous spaces. We consider the feature fields, before and after a convolutional layer, are tensor-valued. By taking use of the lifted feature fields' sparse Fourier coefficients, we propose a unified derivation of kernels via the Fourier domain. When the stabilizer subgroup is a compact Lie group, the sparsity is shown. After lifting the feature to functions on the group, we further incorporate a nonlinear activation and project back to the field by an equivariant convolution. We demonstrate how previous techniques that consider features as the Fourier coefficients in the stabilizer subgroup is a particular case of our activation. Experiments on SO(3) and SE(3) show the effectiveness and robustness of our model through tasks of spherical vector field regression, point cloud classification, and missing molecular prediction. |
3146 | Implicit Bias of Linear Equivariant Networks | Hannah Lawrence, Kristian Georgiev, Andy Dienes, Bobak Kiani | Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of technical applications by explicitly encoding symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is driven by their explicit symmetry bias, a recent line of work has proposed that the implicit bias of training algorithms on particular architectures is key to understanding generalization for overparameterized neural nets. In this context, we show that L-layer full-width linear G-CNNs trained via gradient descent for binary classification converge to solutions with low-rank Fourier matrix coefficients, regularized by the 2/L-Schatten matrix norm. Our work strictly generalizes previous analysis on the implicit bias of linear CNNs to linear G-CNNs over all finite groups, including the challenging setting of non-commutative groups (such as permutations), as well as band-limited G-CNNs over infinite groups. We validate our theorems via experiments on a variety of groups, and empirically explore more realistic nonlinear networks, which locally capture similar regularization patterns. Finally, we provide intuitive interpretations of our Fourier space implicit regularization results in real space via uncertainty principles. |
3222 | Causal Lifting and Link Prediction | Leonardo Cotta | Current state-of-the-art causal models for link prediction assume an underlying set of inherent node factors ---an innate characteristic defined at the node's birth--- that govern the causal evolution of links in the graph. In some causal tasks, however, link formation is path-dependent, i.e., the outcome of link interventions depends on existing links. For instance, in the customer-product graph of an online retailer, the effect of an 85-inch TV ad (treatment) likely depends on whether the costumer already has an 85-inch TV. Unfortunately, existing causal methods are impractical in these scenarios. The cascading functional dependencies between links (due to path dependence) is either unidentifiable or requires an impractical number of control variables. In order to remedy this shortcoming, this work develops the first causal model capable of dealing with path dependencies in link prediction. It introduces the concept of causal lifting, an invariance in causal models that, when satisfied, allows the identification of causal link prediction queries using limited interventional data. On the estimation side, we show how structural pairwise embeddings ---a type of symmetry-based joint representation of node pairs--- presents lower bias and variance than existing node embedding methods, e.g., GNNs and matrix factorization. Finally, we validate our theoretical findings on four datasets under three different scenarios for causal link prediction tasks: knowledge base completion, covariance matrix estimation and consumer-product recommendations. |
3199 | Group-Invariant max Filtering | Dustin Mixon | Given a group action on a vector space, we study the problem of effectively separating the orbits under this action. After briefly discussing the history of this problem, we introduce a family of invariant functions that we call max filters. When the group is a finite subgroup of the orthogonal group, a sufficiently large max filter bank can separate the orbits, and even be bilipschitz in the quotient metric. We conclude by applying max filters to various machine learning tasks. |