First, I briefly review the concept of Neural Operators (NO) and then move to Recurrent Neural Operators (RNO).
1. Neural Operator (NO)
A Neural Operator is a neural network specialized for solving partial differential equations (PDEs).
Given a coefficient function a(x,t), the model outputs the solution u(x,t).
The main specialty of NO comes from its kernel-based integration structure.
After lifting the input function a into a higher-dimensional representation (using a neural network or similar mapping), the model repeatedly updates the target representation through matrix multiplication and element-wise kernel integration.
In practice, this integration is implemented as a numerical summation. Due to this structure, one of the strongest properties of NO emerges: sampling invariance.
Sampling invariance means that even if we increase the sampling resolution of a(x,t), the length of the corresponding latent vector does not increase. Instead, only the number of terms used to approximate the integral increases. Therefore, increasing the sampling resolution of a during training or testing does not affect the model structure, which is a very important advantage of Neural Operators.
2. Recurrent Neural Operator (RNO)
Recurrent Neural Operator (RNO) is a special case of NO that is designed to be robust in long-term extrapolation.
In long-term prediction problems, we usually have limited time(t) to observe a(x,t).
A common approach is teacher forcing, where we input a(x,t) and the ground-truth u(x,t) to predict u(x,t+1)
However, in RNO, the model inputs a(x,t). and its own predicted solution, u(x,t) to predict u(x,t+1)
This forces the model to learn dynamics that are robust to its own prediction errors.
As a result, for highly nonlinear PDEs such as the Navier–Stokes equations, RNO shows extremely strong performance.
According to the paper, RNO can be up to 1,000 times more accurate than the standard teacher-forcing approach in long-term prediction tasks.
3. QNA
Q1. Since RNO needs the whole sequence to compute the loss, parallel computation is not possible. Is this a critical drawback?
Yes, this is a clear limitation of RNO. Because the model is recurrent, we cannot parallelize the computation over time steps.
As mentioned by the authors, RNO has significantly higher computational overhead per epoch and consumes more GPU memory compared to standard NO with teacher forcing. However, the paper reports that both training time and memory usage still remain within practical limits, so this drawback is acceptable in many real applications.
Q2. RNO seems to simply make the task harder by forcing extrapolation. Is learning actually guaranteed to converge?
The paper does not provide a strict theoretical guarantee for this question.
However, in my opinion, the fact that RNO can successfully handle one of the hardest nonlinear PDEs, namely the Navier–Stokes equations, is already strong evidence of its effectiveness. Therefore, it seems reasonable to expect that RNO can solve many other recurrent PDE problems, as long as the nonlinearity is predictable to some extent.
First, I briefly review the concept of Neural Operators (NO) and then move to Recurrent Neural Operators (RNO).
1. Neural Operator (NO)
A Neural Operator is a neural network specialized for solving partial differential equations (PDEs).
Given a coefficient function a(x,t), the model outputs the solution u(x,t).
The main specialty of NO comes from its kernel-based integration structure.
After lifting the input function a into a higher-dimensional representation (using a neural network or similar mapping), the model repeatedly updates the target representation through matrix multiplication and element-wise kernel integration.
In practice, this integration is implemented as a numerical summation. Due to this structure, one of the strongest properties of NO emerges: sampling invariance.
Sampling invariance means that even if we increase the sampling resolution of a(x,t), the length of the corresponding latent vector does not increase. Instead, only the number of terms used to approximate the integral increases. Therefore, increasing the sampling resolution of a during training or testing does not affect the model structure, which is a very important advantage of Neural Operators.
2. Recurrent Neural Operator (RNO)
Recurrent Neural Operator (RNO) is a special case of NO that is designed to be robust in long-term extrapolation.
In long-term prediction problems, we usually have limited time(t) to observe a(x,t).
A common approach is teacher forcing, where we input a(x,t) and the ground-truth u(x,t) to predict u(x,t+1)
However, in RNO, the model inputs a(x,t). and its own predicted solution, u(x,t) to predict u(x,t+1)
This forces the model to learn dynamics that are robust to its own prediction errors.
As a result, for highly nonlinear PDEs such as the Navier–Stokes equations, RNO shows extremely strong performance.
According to the paper, RNO can be up to 1,000 times more accurate than the standard teacher-forcing approach in long-term prediction tasks.
3. QNA
Q1. Since RNO needs the whole sequence to compute the loss, parallel computation is not possible. Is this a critical drawback?
Yes, this is a clear limitation of RNO. Because the model is recurrent, we cannot parallelize the computation over time steps.
As mentioned by the authors, RNO has significantly higher computational overhead per epoch and consumes more GPU memory compared to standard NO with teacher forcing. However, the paper reports that both training time and memory usage still remain within practical limits, so this drawback is acceptable in many real applications.
Q2. RNO seems to simply make the task harder by forcing extrapolation. Is learning actually guaranteed to converge?
The paper does not provide a strict theoretical guarantee for this question.
However, in my opinion, the fact that RNO can successfully handle one of the hardest nonlinear PDEs, namely the Navier–Stokes equations, is already strong evidence of its effectiveness. Therefore, it seems reasonable to expect that RNO can solve many other recurrent PDE problems, as long as the nonlinearity is predictable to some extent.