Avoiding fusion plasma tearing instability with deep reinforcement learning – Nature.com

DIII-D

The DIII-D National Fusion Facility, located at General Atomics in San Diego, USA, is a leading research facility dedicated to advancing the field of fusion energy through experimental and theoretical research. The facility is home to the DIII-D tokamak, which is the largest and most advanced magnetic fusion device in the United States. The major and minor radii of DIII-D are 1.67m and 0.67m, respectively. The toroidal magnetic field can reach up to 2.2T, the plasma current is up to 2.0MA and the external heating power is up to 23MW. DIII-D is equipped with high-resolution real-time plasma diagnostic systems, including a Thomson scattering system45, charge-exchange recombination46 spectroscopy and magnetohydrodynamics reconstruction by EFIT37,39. These diagnostic tools allow for the real-time profiling of electron density, electron temperature, ion temperature, ion rotation, pressure, current density and safety factor. In addition, DIII-D can perform flexible total beam power and torque control through reliable high-frequency modulation of eight different neutral beams in different directions. Therefore, DIII-D is an optimal experimental device for verifying and utilizing our AI controller that observes the plasma state and manipulates the actuators in real time.

One of the unique features of the DIII-D tokamak is its advanced PCS47, which allows researchers to precisely control and manipulate the plasma in real time. This enables researchers to study the behaviour of the plasma under a wide range of conditions and to test ideas for controlling and stabilizing the plasma. The PCS consists of a hierarchical structure of real-time controllers, from the magnetic control system (low-level control) to the profile control system (high-level control). Our tearing-avoidance algorithm is also implemented in this hierarchical structure of the DIII-D PCS and is integrated with the existing lower-level controllers, such as the plasma boundary control algorithm39,41 and the individual beam control algorithm40.

Magnetic reconnection refers to the phenomenon in magnetized plasmas where the magnetic-field line is torn and reconnected owing to the diffusion of magnetic flux () by plasma resistivity. This magnetic reconnection is a ubiquitous event occurring in diverse environments such as the solar atmosphere, the Earths magnetosphere, plasma thrusters and laboratory plasmas like tokamaks. In nested magnetic-field structures in tokamaks, magnetic reconnection at surfaces where q becomes a rational number leads to the formation of separated field lines creating magnetic islands. When these islands grow and become unstable, it is termed tearing instability. The growth rate of the tearing instability classically depends onthe tearing stability index, , shown in equation (2).

$${varDelta }^{{prime} }equiv {left[frac{1}{psi }frac{{rm{d}}psi }{{rm{d}}x}right]}_{x=0-}^{x=0+}$$

(2)

where x is the radial deviation from the rational surface. When is positive, the magnetic topology becomes unstable, allowing (classical) tearing instability to develop. However, even when is negative (classical tearing instability does not grow), neoclassical tearing instability can arise due to the effects of geometry or the drift of charged particles, which can amplify seed perturbations. Subsequently, the altered magnetic topology can either saturate, unable to grow further48,49, or can couple with other magnetohydrodynamic events or plasma turbulence50,51,52,53. Understanding and controlling these tearing instabilities is paramount for achieving stable and sustainable fusion reactions in a tokamak54.

The ITER baseline scenario (IBS) is an operational condition designed for ITER to achieve fusion power of Pfusion=500MW and a fusion gain of QPfusion/Pexternal=10 for a duration of longer than 300s (ref. 12). Compared with present tokamak experiments, the IBS condition is notable for its considerably low edge safety factor (q953) and toroidal torque. With the PCS, DIII-D has a reliable capability to access this IBS condition compared with other devices; however, it has been observed that many of the IBS experiments are terminated by disruptive tearing instabilities19. This is because the tearing instability at the q=2 surface appears too close to the wall when q95 is low, and it easily locks to the wall, leading to disruption when the plasma rotation frequency is low. Therefore, in this study, we conducted experiments to test the AI tearability controller under the conditions of q953 and low toroidal torque (1Nm), where the disruptive tearing instability is easy to be excited.

However, in addition to the IBS where the tearing instability is a critical issue, there are other scenarios, such as hybrid and non-inductive scenarios for ITER12. These different scenarios are less likely to disrupt by tearing, but each has its own challenges, such as no-wall stability limit or minimizing inductive current. Therefore, it is worth developing further AI controllers trained through modified observation, actuation and reward settings to address these different challenges. In addition, the flexibility of the actuators and sensors used in this work at DIII-D will differ from that in ITER and reactors. Control policies under more limited sensing and actuation conditions also need to be developed in the future.

To predict tearing events in DIII-D, we first labelled whether each phase was tearing-stable or not (0 or 1) based on the n=1 Mirnov coil signal in the experiment. Using this labelled experimental data, we trained a DNN-based multimodal dynamic model that receives various plasma profiles and tokamak actuations as input and predicts the 25-ms-after tearing likelihood as output. The trained dynamic model outputs a continuous value between 0 and 1 (so-called tearability), where a value closer to 1 indicates a higher likelihood of a tearing instability occurring after 25ms. The architecture of this model is shown in Extended Data Fig. 1. The detailed descriptions for input and output variables and hyperparameters of the dynamic prediction model can be found in ref. 5. Although this dynamic model is a black box and cannot explicitly provide the underlying cause of the induced tearing instability, it can be utilized as a surrogate for the response of stability, bypassing expensive real-world experiments. As an example, this dynamic model is used as a training environment for the RL of the tearing-avoidance controller in this work. During the RL training process, the dynamic model predicts future N and tearability from the given plasma conditions and actuator values determined by the AI controller. Then the reward is estimated based on the predicted state using equation (1) and provided to the controller as feedback.

Figure 4bd shows the contour plots of the estimated tearability for possible beam powers at the given plasma conditions of our control experiments. The actual beam power controlled by the AI is indicated by the black solid lines. The dashed lines are the contour line of the threshold value set for each discharge, which can roughly represent the stability limit of the beam power at each point. The plot shows that the trained AI controller proactively avoids touching the tearability threshold before the warning of instability.

The sensitivity of the tearability against the diagnostic errors of the electron temperature and density is shown in Extended Data Fig. 2. The filled areas in Extended Data Fig. 2 represent the range of tearability predictions when increasing and decreasing the electron temperature and density by 10%, respectively, from the measurements in 193280. The uncertainty in tearability due to electron temperature error is estimated to be, on average, 10%, and the uncertainty due to electron density error is about 20%. However, even when considering diagnostic errors, the trend in tearing stability over time can still be observed to remain consistent.

The dynamic model used for predicting future tearing-instability dynamics is integrated with the OpenAI Gym library55, which allows it to interact with the controller as a training environment. The tearing-avoidance controller, another DNN model, is trained using the deep deterministic policy gradient56 method, which is implemented using Keras-RL(https://keras.io/)57.

The observation variables consist of 5 different plasma profiles mapped on 33 equally distributed grids of the magnetic flux coordinate: electron density, electron temperature, ion rotation, safety factor and plasma pressure. The safety factor (q) can diverge to infinity at the plasma boundary when the plasma is diverted. Therefore, 1/q has been used for the observation variables to reduce numerical difficulties42. The action variables include the total beam power and the triangularity of the plasma boundary, and their controllable ranges were limited to be consistent with the IBS experiment of DIII-D. The AI-controlled plasma boundary shape has been confirmed to be achievable by the poloidal field coil system of ITER, as shown in Extended Data Fig. 3.

The RL training process of the AI controller is depicted in Extended Data Fig. 4. At each iteration, the observation variables (five different profiles) are randomly selected from experimental data. From this observation, the AI controller determines the desirable beam power and plasma triangularity. To reduce the possibility of local optimization, action noises based on the OrnsteinUhlenbeck process are added to the control action during training. Then the dynamic model predicts N and tearability after 25ms based on the given plasma profiles and actuator values. The reward is evaluated according to equation (1) using the predicted states, and then given as feedback for the RL of the AI controller. As the controller and the dynamic model observe plasma profiles, it can reflect the change of tearing stability even when plasma profiles vary due to unpredictable factors such as wall conditions or impurities. In addition, although this paper focuses on IBS conditions where tearing instability is critical, the RL training itself was not restricted to any specific experimental conditions, ensuring its applicability across all conditions. After training, the Keras-based controller model is converted to C using the Keras2C library58 for the PCS integration.

Previously, a related work17 employed a simple bang-bang control scheme using only beam power to handle tearability. Although our control performance may seem similar to that work in terms of N, it is not true if considering other operating conditions. In ITER and future fusion devices, higher normalized fusion gain (GQ) with stable core instability is critical. This requires a high N and small q95 as (Gpropto {beta }_{{rm{N}}}/{q}_{95}^{2}). At the same time, owing to limited heating capability, high G has to be achieved with weak plasma rotation (or beam torque). Here, high N, small ({q}_{95}^{2}) and low torque are all destabilizing conditions of tearing instability, highlighting tearing instability as a substantial bottleneck of ITER.

As shown in Extended Data Fig. 5, our control achieves a tearing-stable operation of much higher G than the test experiment shown in ref. 17. This is possible by maintaining higher (or similar) N with lower q95 (43), where tearing instability is more likely to occur. In addition, this is achieved with a much weaker torque, further highlighting the capability of our RL controller in harsher conditions. Therefore, this work shows more ITER-relevant performance, providing a closer and clearer path to the high fusion gain with robust tearing avoidance in future devices.

In addition, the performance of RL control in achieving high fusion can be further highlighted when considering the non-monotonic effect of N on tearing instability. Unlike q95 or torque, both increasing and decreasing N can destabilize tearing instabilities. This leads to the existence of optimal fusion gain (as GN), which enables the tearing-stable operation and makes system control more complicated. Here, Extended Data Fig. 6 shows the trace of RL-controller discharge in the space of fusion gain versus time, where the contour colour illustrates the tearability. This clearly shows that the RL controller successfully drives plasma through the valley of tearability, ensuring stable operation and showing its remarkable performance in such a complicated system.

Such a superior performance is feasible by the advantages of RL over conventional approaches, which are described below.

By employing a multi-actuator (beam and shape) multi-objectives (low tearability and high N) controller using RL, we were able to enter a higher-N region while maintaining tolerable tearability. As shown in Extended Data Fig. 5, our controlled discharge (193280) shows a higher N and G than the one in the previous work (176757). This advantage of our controller is because it adjusts the beam and plasma shape simultaneously to achieve both increasing N and lowering tearability. It is notable that our discharge has more unfavourable conditions (lower q95 and lower torque) in terms of both N and tearing stability.

The previous tearability model evaluates the tearing likelihood based on current zero-dimensional measurements, not considering the upcoming actuation control. However, our model considers the one-dimensional detailed profiles and also the upcoming actuations, then predicts the future tearability response to the future control. This can provide a more flexible applicability in terms of control. Our RL controller has been trained to understand this tearability response and can consider future effects, while the previous controller only sees the current stability. By considering the future responses, ours offers a more optimal actuation in the longer term instead of a greedy manner.

This enables the application in more generic situations beyond our experiments. For instance, as shown in Extended Data Fig. 7a, tearability is a nonlinear function of N. In some cases (Extended Data Fig. 7b), this relation is also non-monotonic, making increasing the beam power the desired command to reduce tearability (as shown in Extended Data Fig. 7b with a right-directed arrow). This is due to the diversity of the tearing-instability sources such as N limit, and the current well. In such cases, using a simple control shown in ref. 17 could result in oscillatory actuation or even further destabilization. In the case of RL control, there is less oscillation and it controls more swiftly below the threshold, achieving a higher N through multi-actuator control, as shown in Extended Data Fig. 7c.

Plasma shape parameters are key control knobs that influence various types of plasma instability. In DIII-D, the shape parameters such as triangularity and elongation can be manipulated through proximity control41. In this study, we used the top triangularity as one of the action variables for the AI controller. The bottom triangularity remained fixed across our experiments because it is directly linked to the strike point on the inner wall.

We also note that the changes in top triangularity through AI control are quite large compared with typical adjustments. Therefore, it is necessary to verify whether such large plasma shape changes are permitted for the capability of magnetic coils in ITER. Additional analysis, as shown in Extended Data Fig. 3, confirms that the rescaled plasma shape for ITER can be achieved within the coil current limits.

The experiments in Figs. 3b and 4a have shown that the tearability can be maintained through appropriate AI-based control. However, it is necessary to verify whether it can robustly maintain low tearability when additional actuators are added and plasma conditions change. In particular, ITER plans to use not only 50MW beams but also 1020MW radiofrequency actuators. Electron cyclotron radiofrequency heating directly changes the electron temperature profile and the stability can vary sensitively. Therefore, we conducted an experiment to see whether the AI controller successfully maintains low tearability under new conditions where radiofrequency heating is added. In discharge 193282 (green lines in Extended Data Fig. 8), 1.8MW of radiofrequency heating is preprogrammed to be steadily applied in the background while beam power and plasma triangularity are controlled via AI. Here, the radiofrequency heating is towards the core of the plasma and the current drive at the tearing location is negligible.

However, owing to the sudden loss of plasma current control at t=3.1s, q95 increased from 3 to 4, and the subsequent discharge did not proceed under the ITER baseline condition. It should be noted that this change in plasma current control was unintentional and not directly related to AI control. Such plasma current fluctuation sharply raised the tearability to exceed the threshold temporarily at t=3.2s, but it was immediately stabilized by continued AI control. Although it is eventually disrupted owing to insufficient plasma current by the loss of plasma current before the preprogrammed end of the flat top, this accidental experiment demonstrates the robustness of AI-based tearability control against additional heating actuators, a wider q95 range and accidental current fluctuation.

In normal plasma experiments, control parameters are kept stationary with a feed-forward set-up, so that each discharge is a single data point. However, in our experiments, both plasma and control are varying throughout the discharge. Thus, one discharge consists of multiple control cycles. Therefore, our results are more important than one would expect compared with standard fixed control plasma experiments, supporting the reliability of the control scheme.

In addition, the predicted plasma response due to RL control for 1,000 samples randomly selected from the experimental database, which includes not just the IBS but all experimental conditions, is shown in Extended Data Fig. 9a,b. When T>0.5 (unstable, top), the controller tries to decrease T rather than affecting N, and when T<0.5 (stable, bottom), it tries to increase N. This matches the expected response by the reward shown in equation (1). In 98.6% of the unstable phase, the controller reduced the tearability, and in 90.7% of the stable phase, the controller increased N.

Extended Data Fig. 9c shows the achieved time-integrated N for the discharge sequences of our experiment session. Discharges until 193276 either did not have the RL control applied or had tearing instability occurring before the control started, and discharges after 193277 had the RL control applied. Before RL control, all shots except one (193266: low-N reference shown in Fig. 3b) were disrupted, but after RL control was applied, only two (193277 and 193282) were disrupted, which were discussed earlier. The average time-integrated N also increased after the RL control. In addition, the input feature ranges of the controlled discharges are compared with the training database distribution in Extended Data Fig. 10, which indicates that our experiments are neither too centred (the model not overfitted to our experimental condition) nor too far out (confirming the availability of our controller on the experiments).

View post:
Avoiding fusion plasma tearing instability with deep reinforcement learning - Nature.com

Related Posts

Comments are closed.