Loading...
Thumbnail Image
Publication

Using process data to generate an optimal control policy via apprenticeship and reinforcement learning

Mowbray, Max; orcid: 0000-0003-1398-0469; email: max.mowbray@manchester.ac.uk
Smith, Robin; email: robin.smith@manchester.ac.uk
Del Rio‐Chanona, Ehecatl A.; email: a.del-rio-chanona@imperial.ac.uk
Zhang, Dongda; orcid: 0000-0001-5956-4618; email: dongda.zhang@manchester.ac.uk
Advisors
Editors
Other Contributors
Affiliation
EPub Date
Publication Date
2021-05-15
Submitted Date
2020-10-04
Collections
Other Titles
Abstract
Abstract: Reinforcement learning (RL) is a data‐driven approach to synthesizing an optimal control policy. A barrier to wide implementation of RL‐based controllers is its data‐hungry nature during online training and its inability to extract useful information from human operator and historical process operation data. Here, we present a two‐step framework to resolve this challenge. First, we employ apprenticeship learning via inverse RL to analyze historical process data for synchronous identification of a reward function and parameterization of the control policy. This is conducted offline. Second, the parameterization is improved online efficiently under the ongoing process via RL within only a few iterations. Significant advantages of this framework include to allow for the hot‐start of RL algorithms for process optimal control, and robust abstraction of existing controllers and control knowledge from data. The framework is demonstrated on three case studies, showing its potential for chemical process control.
Citation
AIChE Journal, page e17306
Publisher
John Wiley & Sons, Inc.
Journal
Research Unit
DOI
PubMed ID
PubMed Central ID
Type
article
Language
Description
From Wiley via Jisc Publications Router
History: received 2020-10-04, rev-recd 2021-04-23, accepted 2021-05-03, pub-electronic 2021-05-15
Article version: VoR
Publication status: Published
Series/Report no.
ISSN
EISSN
ISBN
ISMN
Gov't Doc
Test Link
Sponsors
Additional Links