# A2L-F: Learning for Optimization & Control II – Invited Special Session

Session Type: Lecture

Session Code: A2L-F

Location: Room 6

Date & Time: Wednesday March 22, 2023 (10:20 - 11:20)

Chair: Enrique Mallada, Mahyar Fazlyab

Track: 12

Paper ID | Paper Name | Authors | Abstract |
---|---|---|---|

3110 | Learning-Based Online Feedback Optimization of Dynamical Systems | Liliaokeawawa Cothren, Emiliano Dall’Anese | Control frameworks for modern autonomous systems and large-scale network systems critically rely on the use of rich data from sensing and perceptual mechanisms. For example, data and measurements are processed to acquire estimates of states and models of the physical system. We investigates how to integrate perceptual and sensing information into feedback controllers inspired by optimization algorithms, where the goal is to steer a dynamical system toward solutions of an optimization problem encapsulating time-varying costs associated with the system’s inputs and states. In particular, we consider controllers that are design based on adaptations of first-order optimization methods, and augmented with learning modules that estimate the state of the system from perceptual information and the system model from both historical and real-time data. These learning modules may be built based on neural networks or statistical learning tools. We analyze the performance of the feedback controller, and derive sufficient conditions to guarantee (local) input-to-state stability of the control loop. The ISS bounds naturally incorporate the effect of the learning errors. We also provide transient and ISS bounds for stochastic learning errors and disturbances. |

3190 | ENFORCER: Guaranteed Conformance with Symbolic Wrappers for Neural Network Dynamics Models | Kaustubh Sridhar, Souradeep Dutta, James Weimer, Insup Lee | Deep neural networks(NN) serve as the fundamental building blocks for modeling complex nonlinear dynamics in robotics and medicine. In safety-critical applications it is important that data-driven models conform to established knowledge from natural sciences. This is often available as a model $M$, for instance the physics simulator of the F1 racing car. Thus it can be potentially black-box, unlike the case of physics-informed NNs. Our goal is to learn a NN model $f_{\theta}$ which can respect such natural constraints while approximating the training distribution accurately. Augmented Lagrangian methods can achieve this to some extent. We propose a tool - ENFORCER, which introduces the idea of guaranteed conformance to any such Lipschitz continuous model $M$. Since our approach uses a symbolic wrapper built using a numerically sound approach, we are agnostic to the learning algorithm. Our approach uses the following intuition: if restricted to a small enough subset of the input-region, the output of the model $M$ can be under-approximated by an interval. If we can ensure that the predictions of $f_\theta$ stay within this interval (with a constraining operator $\Gamma$) then we can bound the difference between $\Gamma(f_\theta)$ and $M$, as being proportional to the size of this subset. Which reduces with finer partitioning of the input space. Our constrained neurosymbolic models, have been shown to work on three well-known case studies -- car model in CARLA, drones and artificial pancreas. We show order-of-magnitude improvements in terms of conformance to natural laws, over augmented Lagrangian and standard training, especially in out-of-distribution samples. |

3231 | Continuous-Time Linear-Quadratic ϵ-Greedy Policy | Mohamad Kazem Shirani Faradonbeh | This work theoretically studies a ubiquitous reinforcement learning policy for controlling the canonical model of continuous-time stochastic linear-quadratic systems. We show that e-Greedy reinforcement learning policy addresses the exploration-exploitation dilemma in linear control systems that evolve according to unknown stochastic differential equations and their operating cost has a quadratic form. More precisely, we establish square-root of time regret bounds, indicating that e-Greedy learns optimal control actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parameters is shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamental challenges of continuous-time reinforcement learning. |