Discussing the article: "Neural networks made easy (Part 89): Frequency Enhanced Decomposition Transformer (FEDformer)"

 

Check out the new article: Neural networks made easy (Part 89): Frequency Enhanced Decomposition Transformer (FEDformer).

All the models we have considered so far analyze the state of the environment as a time sequence. However, the time series can also be represented in the form of frequency features. In this article, I introduce you to an algorithm that uses frequency components of a time sequence to predict future states.

Long-term forecasting of time series is a long-standing problem in solving various applied problems. Transformer-based models show promising results. However, high computational complexity and memory requirements make it difficult to use the Transformer for modeling long sequences. This has given rise to numerous studies devoted to reducing computational costs of the Transformer algorithm.

Despite the progress made by Transformer-based time series forecasting methods based, in some cases they fail to capture the common features of the time series distribution. The authors of the paper "FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting" have made an attempt to solve this problem. They compare the actual data of a time series with its predicted values obtained from the vanilla Transformer. Below is a screenshot from that paper.

You can see that the distribution of the forecast time series is very different from the true one. The discrepancy between expected and predicted values can be explained by the point attention in the Transformer. Since the forecast for each time step is made individually and independently, it is likely that the model cannot preserve the global properties and statistics of the time series as a whole. To solve this problem, the authors of the article exploit two ideas.

The first is to use the seasonal trend decomposition approach, which is widely used in time series analysis. The authors of the paper present a special model architecture that effectively approximates the distribution of forecasts to the true one.

The second idea is to implement Fourier analysis into the Transformer algorithm. Instead of applying the Transformer to the time measurement of the sequence, we can analyze its frequency features. This helps the Transformer better capture the global properties of time series.

The combination of the proposed ideas is implemented in the Frequency Enhanced Decomposition Transformer model, FEDformer.

Author: Dmitriy Gizlyk