The Interplay Between Learning, Optimization, & Control I – Invited Special Session

B1L-F: The Interplay Between Learning, Optimization, & Control I - Invited Special Session

Session Type: Lecture
Session Code: B1L-F
Location: Room 6
Date & Time: Thursday March 23, 2023 (09:00 - 10:00)
Chair: Li Na, Guannan Qu
Track: 12
Paper No. Paper NameAuthorsAbstract
3020Model-Free Nonlinear Feedback OptimizationZhiyu He{1}, Jianping He{2}, Xinping Guan{2}, Saverio Bolognani{1}, Florian Dörfler{1}Feedback optimization is a control paradigm that enables physical systems to autonomously reach efficient operating points. Its central idea is to interconnect optimization iterations in closed-loop with the physical plant. Since iterative gradient-based methods are extensively used to achieve optimality, feedback optimization controllers typically require the knowledge of the steady-state sensitivity of the plant, which may not be easily accessible in some applications. In contrast, in this paper we develop a model-free feedback controller for efficient steady-state operation of general dynamical systems. The proposed design consists in updating control inputs via gradient estimates constructed from evaluations of the nonconvex objective at the current input and at the measured output. We study the dynamic interconnection of the proposed iterative controller with a stable nonlinear discrete-time plant. For this setup, we characterize the optimality and the stability of the closed-loop behavior as functions of the problem dimension, the number of iterations, and the rate of convergence of the physical plant. To handle general constraints that affect multiple inputs, we enhance the controller with Frank-Wolfe type updates.
3037Multi-Armed Bandit Learning on a GraphTianpeng Zhang{1}, Kasper Johansson{2}, Na Li{1}The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively studied in the context of decision-making under uncertainty. In many real-world applications, such as robotic applications, selecting an arm corresponds to a physical action that constrains the choices of the next available arms (actions). Motivated by this, we study an extension of MAB called the graph bandit, where an agent travels over a graph to maximize the reward collected from different nodes. The graph defines the agent\'s freedom in selecting the next available nodes at each step. We assume the graph structure is fully available, but the reward distributions are unknown. Built upon an offline graph-based planning algorithm and the principle of optimism, we design a learning algorithm, G-UCB, that balances long-term exploration-exploitation using the principle of optimism. We show that our proposed algorithm achieves $O(\\sqrt{|S|T\\log(T)}+D|S|\\log T)$ learning regret, where $|S|$ is the number of nodes and $D$ is the diameter of the graph, which matches the theoretical lower bound $\\Omega(\\sqrt{|S|T})$ up to logarithmic factors. To our knowledge, this result is among the first tight regret bounds in non-episodic, un-discounted learning problems with known deterministic transitions. Numerical experiments confirm that our algorithm outperforms several benchmarks. We also present a synthetic robotic application modeled under the graph bandit framework, where a robot moves on a network of rural/suburban locations to provide high-speed internet access using our proposed algorithm.