Multistep Criticality Search and Power Shaping in Microreactors with Reinforcement Learning

Read original: arXiv:2406.15931 - Published 6/26/2024 by Majdi I. Radaideh, Leo Tunkle, Dean Price, Kamal Abdulraheem, Linyu Lin, Moutaz Elias

🏅

Overview

Reducing operation and maintenance costs is a key objective for advanced nuclear reactors, especially microreactors.
Developing robust autonomous control algorithms is essential to ensure safe and autonomous reactor operation.
Reinforcement learning (RL) algorithms have shown promise in control problems, such as in fusion tokamaks and building energy management.
This work explores the use of RL for intelligent control in nuclear microreactors.

Plain English Explanation

Nuclear power plants, including small-scale microreactors, need to be operated efficiently and cost-effectively. One way to achieve this is by developing advanced control systems that can automatically manage the reactor without constant human supervision.

Reinforcement learning (RL) is a type of artificial intelligence that has shown promise in controlling complex systems. RL algorithms learn by trial-and-error, exploring different actions and learning which ones work best to achieve their goals.

In this research, the authors used two cutting-edge RL techniques, Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C), to train an RL agent to control a simulated nuclear microreactor. The agent learned how to adjust the position of the reactor's control drums to maintain the reactor in a safe, stable, and efficient state as the fuel is used up over time.

The results demonstrate that the PPO-based RL agent was able to very effectively control the reactor, keeping the power distribution balanced and the reactor critical (i.e., maintaining the nuclear chain reaction). This suggests that RL could be a powerful tool for enabling real-time autonomous control of nuclear reactors through "digital twins" - virtual simulations of the actual reactor.

Technical Explanation

The researchers used a high-fidelity simulation of a nuclear microreactor design inspired by the Westinghouse eVinci[^1] concept to train their RL control policies. They first utilized a Serpent[^2] model to generate data on the reactor's control drum positions, core criticality, and power distribution. This data was then used to train a feedforward neural network surrogate model, which could quickly estimate the reactor's state given the control drum positions.

The researchers then trained two state-of-the-art RL algorithms, Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C), to learn control policies that could identify the optimal drum positions to maintain critical core conditions and a symmetrical power distribution across the reactor's six core sections.

The results showed that the PPO-based RL agent was able to achieve excellent performance, keeping the "hextant power tilt ratio" (a measure of power distribution symmetry) within the strict limit of less than 1.02 and maintaining criticality within a 10 pcm (percent millirho) range. In contrast, the A2C agent did not perform as well on these metrics.

These findings suggest that well-trained RL control policies can quickly identify the necessary control actions, opening the door for real-time autonomous control of nuclear reactors through digital twins.

[^1]: The Westinghouse eVinci is a microreactor design that aims to provide a compact, passively safe, and easily deployable nuclear power solution. [^2]: Serpent is a Monte Carlo neutron transport code used for reactor physics calculations and simulations.

Critical Analysis

The paper demonstrates the potential of reinforcement learning to enable autonomous control of nuclear microreactors, which could lead to significant reductions in operation and maintenance costs. However, it is important to note that this research was conducted in a simulated environment, and further validation in real-world scenarios would be necessary before deploying such RL-based control systems in actual nuclear reactors.

Additionally, the paper does not address potential safety concerns or regulatory hurdles that would need to be overcome before RL-based control systems could be implemented in nuclear reactors. Integrating RL with traditional model-predictive control approaches may be one way to address these concerns and provide a more comprehensive control solution.

Overall, this research represents an important step towards the development of advanced autonomous control systems for nuclear microreactors, but more work is still needed to fully validate and deploy these techniques in real-world settings.

Conclusion

This paper explores the use of reinforcement learning (RL) algorithms, specifically Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C), for intelligent control of nuclear microreactors. The results demonstrate that a well-trained PPO-based RL agent can effectively maintain critical core conditions and a symmetrical power distribution in a simulated microreactor, suggesting that RL could be a promising approach for enabling real-time autonomous control of nuclear reactors through digital twins. However, further research is needed to address safety and regulatory concerns before such RL-based control systems could be implemented in real-world nuclear power plants.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🏅

Multistep Criticality Search and Power Shaping in Microreactors with Reinforcement Learning

Majdi I. Radaideh, Leo Tunkle, Dean Price, Kamal Abdulraheem, Linyu Lin, Moutaz Elias

Reducing operation and maintenance costs is a key objective for advanced reactors in general and microreactors in particular. To achieve this reduction, developing robust autonomous control algorithms is essential to ensure safe and autonomous reactor operation. Recently, artificial intelligence and machine learning algorithms, specifically reinforcement learning (RL) algorithms, have seen rapid increased application to control problems, such as plasma control in fusion tokamaks and building energy management. In this work, we introduce the use of RL for intelligent control in nuclear microreactors. The RL agent is trained using proximal policy optimization (PPO) and advantage actor-critic (A2C), cutting-edge deep RL techniques, based on a high-fidelity simulation of a microreactor design inspired by the Westinghouse eVincitextsuperscript{TM} design. We utilized a Serpent model to generate data on drum positions, core criticality, and core power distribution for training a feedforward neural network surrogate model. This surrogate model was then used to guide a PPO and A2C control policies in determining the optimal drum position across various reactor burnup states, ensuring critical core conditions and symmetrical power distribution across all six core portions. The results demonstrate the excellent performance of PPO in identifying optimal drum positions, achieving a hextant power tilt ratio of approximately 1.002 (within the limit of $<$ 1.02) and maintaining criticality within a 10 pcm range. A2C did not provide as competitive of a performance as PPO in terms of performance metrics for all burnup steps considered in the cycle. Additionally, the results highlight the capability of well-trained RL control policies to quickly identify control actions, suggesting a promising approach for enabling real-time autonomous control through digital twins.

6/26/2024

Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning

Paul Seurin, Koroush Shirvan

Optimizing the fuel cycle cost through the optimization of nuclear reactor core loading patterns involves multiple objectives and constraints, leading to a vast number of candidate solutions that cannot be explicitly solved. To advance the state-of-the-art in core reload patterns, we have developed methods based on Deep Reinforcement Learning (DRL) for both single- and multi-objective optimization. Our previous research has laid the groundwork for these approaches and demonstrated their ability to discover high-quality patterns within a reasonable time frame. On the other hand, stochastic optimization (SO) approaches are commonly used in the literature, but there is no rigorous explanation that shows which approach is better in which scenario. In this paper, we demonstrate the advantage of our RL-based approach, specifically using Proximal Policy Optimization (PPO), against the most commonly used SO-based methods: Genetic Algorithm (GA), Parallel Simulated Annealing (PSA) with mixing of states, and Tabu Search (TS), as well as an ensemble-based method, Prioritized Replay Evolutionary and Swarm Algorithm (PESA). We found that the LP scenarios derived in this paper are amenable to a global search to identify promising research directions rapidly, but then need to transition into a local search to exploit these directions efficiently and prevent getting stuck in local optima. PPO adapts its search capability via a policy with learnable weights, allowing it to function as both a global and local search method. Subsequently, we compared all algorithms against PPO in long runs, which exacerbated the differences seen in the shorter cases. Overall, the work demonstrates the statistical superiority of PPO compared to the other considered algorithms.

7/16/2024

Design Optimization of Nuclear Fusion Reactor through Deep Reinforcement Learning

Jinsu Kim, Jaemin Seo

This research explores the application of Deep Reinforcement Learning (DRL) to optimize the design of a nuclear fusion reactor. DRL can efficiently address the challenging issues attributed to multiple physics and engineering constraints for steady-state operation. The fusion reactor design computation and the optimization code applicable to parallelization with DRL are developed. The proposed framework enables finding the optimal reactor design that satisfies the operational requirements while reducing building costs. Multi-objective design optimization for a fusion reactor is now simplified by DRL, indicating the high potential of the proposed framework for advancing the efficient and sustainable design of future reactors.

9/14/2024

AI Enabled Neutron Flux Measurement and Virtual Calibration in Boiling Water Reactors

Anirudh Tunga, Jordan Heim, Michael Mueterthies, Thomas Gruenwald, Jonathan Nistor

Accurately capturing the three dimensional power distribution within a reactor core is vital for ensuring the safe and economical operation of the reactor, compliance with Technical Specifications, and fuel cycle planning (safety, control, and performance evaluation). Offline (that is, during cycle planning and core design), a three dimensional neutronics simulator is used to estimate the reactor's power, moderator, void, and flow distributions, from which margin to thermal limits and fuel exposures can be approximated. Online, this is accomplished with a system of local power range monitors (LPRMs) designed to capture enough neutron flux information to infer the full nodal power distribution. Certain problems with this process, ranging from measurement and calibration to the power adaption process, pose challenges to operators and limit the ability to design reload cores economically (e.g., engineering in insufficient margin or more margin than required). Artificial intelligence (AI) and machine learning (ML) are being used to solve the problems to reduce maintenance costs, improve the accuracy of online local power measurements, and decrease the bias between offline and online power distributions, thereby leading to a greater ability to design safe and economical reload cores. We present ML models trained from two deep neural network (DNN) architectures, SurrogateNet and LPRMNet, that demonstrate a testing error of 1 percent and 3 percent, respectively. Applications of these models can include virtual sensing capability for bypassed or malfunctioning LPRMs, on demand virtual calibration of detectors between successive calibrations, highly accurate nuclear end of life determinations for LPRMs, and reduced bias between measured and predicted power distributions within the core.

9/27/2024