Microgrid is a promising technology of distributed energy supply system, which consists of storage devices, generation capacities including renewable sources, and controllable loads. It has been widely investigated and applied for residential and commercial end-use customers as well as critical facilities. In this paper, we propose a joint state-based dynamic control model on microgrids and manufacturing systems where optimal controls for both sides are implemented to coordinate the energy demand and supply so that the overall production cost can be minimized considering the constraint of production target. Markov Decision Process (MDP) is used to formulate the decision-making procedure. The main computing challenge to solve the formulated MDP lies in the co-existence of both discrete and continuous parts of the high-dimensional state/action space that are intertwined with constraints. A novel reinforcement learning algorithm that leverages both Temporal Difference (TD) and Deterministic Policy Gradient (DPG) algorithms is proposed to address the computation challenge. Experiments for a manufacturing system with an onsite microgrid system with renewable sources have been implemented to justify the effectiveness of the proposed method.


Mathematics and Statistics

Second Department

Engineering Management and Systems Engineering

Keywords and Phrases

Deterministic Policy Gradient; Manufacturing; Markov Decision Process; Microgrid; Reinforcement Learning; Temporal Difference Learning

International Standard Serial Number (ISSN)


Document Type

Article - Journal

Document Version

Final Version

File Type





© 2023 Elsevier, All rights reserved.

Publication Date

01 Jun 2022