Novel Approaches to Support Energy Systems: Analysis & Learning II – Invited Special Session

B5L-F: Novel Approaches to Support Energy Systems: Analysis & Learning II - Invited Special Session

Session Type: Lecture
Session Code: B5L-F
Location: Room 6
Date & Time: Thursday March 23, 2023 (15:20-16:20)
Chair: Yury Dvorkin, Sijia Geng
Track: 12
Paper IDPaper TitleAuthorsAbstract
3034Deep Learning for Optimal Volt/VAR Control Using Distributed Energy ResourcesSarthak Gupta{2}, Spyros Chatzivasileiadis{1}, Vassilis Kekatos{2}Given their intermittency, distributed energy resources (DERs) have been commissioned with regulating voltages at fast timescales. Although the IEEE 1547 standard specifies the shape of Volt/VAR control rules, it is not clear how to optimally customize them per DER. Optimal rule design (ORD) is a challenging problem as Volt/VAR rules introduce nonlinear dynamics, require bilinear optimization models, and lurk trade-offs between stability and steady-state performance. To tackle ORD, we develop a deep neural network (DNN) that serves as a digital twin of Volt/VAR dynamics. The DNN takes grid conditions as inputs, uses rule parameters as weights, and computes equilibrium voltages as outputs. Thanks to this genuine design, ORD is reformulated as a deep learning task using grid scenarios as training data and aiming at driving the predicted variables being the equilibrium voltages close to unity. The learning task is solved by modifying efficient deep-learning routines to enforce constraints on rule parameters. In the course of DNN-based ORD, we also review and expand on stability conditions and convergence rates for Volt/VAR rules on single-/multi-phase feeders. To benchmark the optimality and runtime of DNN-based ORD, we also devise a novel mixed-integer nonlinear program formulation. Numerical tests showcase the merits of DNN-based ORD.
3080Risk-Constrained Reinforcement Learning for Wide-Area Damping ControlKyung-Bin Kwon, Hao ZhuThis work develops a risk-constrained reinforcement learning (RL) approach for the wide-area damping control (WADC) problem in power systems. The integration of inverter-based resources (IBRs) makes it useful to adopt a data-driven RL approach to learn the best control policy, to address the lack of modeling information for IBRs. In addition, uncertain communication delays are known to critically affect the WADC design. To tackle this issue, we advocate to incorporate the mean-variance risk constraint into the linear quadratic regulator (LQR) based RL objective, and reformulate the former as a quadratic constraint. This reformulation allows to solve the dual problem using a data-driven Stochastic Gradient Descent with Max-oracle (SGDmax) algorithm, which has been shown to converge to the KKT stationarity conditions with high probability. Numerical tests based on the IEEE 68-bus test case have demonstrated its performance improvement under communication delays in terms of reduced frequency deviations and inter-area oscillations.
3061Efficient Reinforcement Learning Through Trajectory GenerationBaosen ZhangA key barrier to using reinforcement learning (RL) in many real-world applications is the requirement of a large number of system interactions to learn a good control policy. Off-policy and Offline RL methods have been proposed to reduce the number of interactions with the physical environment by learning control policies from historical data. However, their performances suffer from the lack of exploration and the distributional shifts in trajectories once controllers are updated. Moreover, most RL methods require that all states are directly observed, which is difficult to be attained in many settings. To overcome these challenges, we propose a trajectory generation algorithm, which adaptively generates new trajectories as if the system is being operated and explored under the updated control policies. Motivated by the fundamental lemma for linear systems, assuming sufficient excitation, we generate trajectories from linear combinations of historical trajectories. For linear feedback control, we prove that the algorithm generates trajectories with the exact distribution as if they were sampled from the real system using the updated control policy. In particular, the algorithm extends to systems where the states are not directly observed. Experiments show that the proposed method significantly reduces the number of sampled data needed for RL algorithms.