Learning for Optimization & Control III – Invited Special Session

A4L-F: Learning for Optimization & Control III - Invited Special Session

Session Type: Lecture
Session Code: A4L-F
Location: Room 6
Date & Time: Wednesday March 22, 2023 (14:00 - 15:00)
Chair: Mahyar Fazlyab, Enrique Mallada
Track: 12
Paper IDPaper NameAuthorsAbstract
3033Policy Gradients for Probabilistic Constrained Reinforcement LearningWeiqin Chen{2}, Dharmashankar Subramanian{1}, Santiago Paternain{2}This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, a safe policy or controller is one that, with high probability, maintains the trajectory of the agent in a given safe set. We relate this notion of safety to the notion of average safety often considered in the literature by providing theoretical bounds in terms of their safety and performance. The challenge of working with the probabilistic notion of safety considered in this work is the lack of expressions for their gradients. Indeed, policy optimization algorithms rely on gradients of the objective function and the constraints. To the best of our knowledge, this work is the first one providing such explicit gradient expressions for probabilistic constraints. It is worth noting that such probabilistic gradients are naturally algorithm independent, which provides possibilities for them to be applied to various policy-based algorithms. In addition, we consider a continuous navigation problem to empirically illustrate the advantages (in terms of safety and performance) of working with probabilistic constraints as compared to average constraints.
3148Direct Policy Search for Robust Control: Nonsmooth Optimization Methods and Iteration ComplexityBin Hu, Xingang GuoDirect policy search has been widely applied in modern reinforcement learning and continuous control. However, the theoretical properties of direct policy search on nonsmooth robust control synthesis have not been fully understood. The optimal H-infinity control framework aims at designing a policy to minimize the closed-loop H-infinity norm, and is arguably the most fundamental robust control paradigm. In this talk, we discuss the recently developed global convergence theory of direct policy search on the H-infinity state-feedback control design problem. Notice that policy search for optimal H-infinity control leads to a constrained nonconvex nonsmooth optimization problem, where the nonconvex feasible set consists of all the policies stabilizing the closed-loop dynamics. We show that for this nonsmooth optimization problem, all Clarke stationary points are global minimum. Next, we identify the coerciveness of the closed-loop H-infinity objective function, and prove that all the sublevel sets of the resultant policy search problem are compact. Based on these properties, we show that Goldstein’s subgradient method and its implementable variants can be guaranteed to stay in the nonconvex feasible set and eventually find the global optimum for this robust control problem. Then, iteration complexity is also carefully addressed, and different notions of stationarity are clarified. Finally, we conclude our discussion with several possible extensions.
3093Safe Planning in Dynamic Environments Using Conformal PredictionLars Lindemann{2}, Matthew Cleaveland{1}, Gihyun Shim{1}, George J. Pappas{1}We propose a framework for planning in unknown dynamic environments with probabilistic safety guarantees using conformal prediction. Particularly, we design a model predictive controller (MPC) that uses i) trajectory predictions of the dynamic environment, and ii) prediction regions quantifying the uncertainty of the predictions. To obtain prediction regions, we use conformal prediction, a statistical tool for uncertainty quantification, that requires availability of offline trajectory data - a reasonable assumption in many applications such as autonomous driving. The prediction regions are valid, i.e., they hold with a user-defined probability, so that the MPC is provably safe. We illustrate the results in the self-driving car simulator CARLA at a pedestrian-filled intersection. The strength of our approach is compatibility with state of the art trajectory predictors, e.g., RNNs and LSTMs, while making no assumptions on the underlying trajectory-generating distribution. To the best of our knowledge, these are the first results that provide valid safety guarantees in such a setting.