Neural Networks in Trading: Hybrid Graph Sequence Models (Final Part)

MetaTrader 5 — Trading systems | 29 January 2026, 15:54

2 558

Dmitriy Gizlyk

Introduction

In the previous article, we examined the theoretical aspects of the unified GSM+ graph sequence framework, an innovative data processing method that combines the advantages of multiple architectural solutions. GSM++ comprises three key stages: graph tokenization, local node encoding, and global dependency encoding. This structure enhances the efficiency of working with graph-structured data, improving analytical capabilities for complex tasks in finance and other domains involving time series and structured data analysis.

A critical element of the system is hierarchical graph tokenization, which enables the transformation of complex data into a compact sequential representation. The tokenization methodology used in GSM++ preserves the topological and temporal characteristics of the original data, significantly improving feature extraction accuracy. Moreover, this approach helps reduce computational costs when analyzing large datasets, providing a balance between processing speed and analytical depth. Depending on the task, the level of analysis detail can be adapted, making the methodology both versatile and flexible.

Traditional analysis methods often encounter information redundancy, which increases computational load and complicates the detection of patterns. However, the use of adaptive encoding mechanisms allows the extraction of the most significant node features and their efficient transmission to subsequent analysis layers. This reduces the volume of irrelevant information and enhances the model's ability to identify local relationships between nodes. Another advantage of local encoding is its capacity to dynamically adapt to changes in the source data, which is particularly critical in volatile financial markets where sudden shifts can significantly impact forecast accuracy.

Analytical capabilities are further enhanced by the use of a hybrid encoder that combines recurrent models and transformers. This approach leverages the strengths of both methods: recurrent mechanisms efficiently process time series by capturing event sequences, while transformers with Self-Attention mechanisms effectively identify complex, order-independent dependencies. This combination not only improves model accuracy but also increases robustness to market dynamics. Additionally, the hybrid encoder can adapt to various scenarios, allowing a configurable balance between forecasting accuracy and computational efficiency depending on specific task requirements.

GSM++

In the practical section of the previous article, we began implementing our own interpretation of the GSM++ framework using MQL5. Considering the high volatility of financial data, we decided not to use the framework authors' proposed hierarchical similarity-based clustering (HAC). Instead, we chose a trainable mixed tokenization module, which significantly increases model flexibility and adaptability when working with real market data.

The implemented Mixture of Tokenization (MoT) algorithm employs four different token types per bar, enabling more detailed market data analysis. For this experiment, we use the following approaches to encode the source data:

Node tokenization — each bar is treated as a separate analysis element, allowing the assessment of its individual characteristics and identification of key parameters influencing subsequent developments.
Edge tokenization — analyzes dependencies between neighboring bars, detecting short-term correlations and trends useful for predicting near-term changes.
Subgraph tokenization — examines groups of bars to identify more complex structures and stable patterns important for strategic forecasting.
Subgraph tokenization of individual unitary sequences — enables in-depth analysis of univariate sequences and their interdependencies, critical for uncovering hidden patterns in the data.

These tokens are combined using the Attention Pooling mechanism, adapted from the R-MAT framework. This method allows the model to focus on the most significant features while discarding less relevant data, substantially improving decision-making. The main advantage of Attention Pooling is its ability to efficiently handle complex data structures, emphasizing the most relevant characteristics while minimizing noise impact.

To implement this approach, we created the CNeuronMoT object, inheriting base functionality from CNeuronMHAttentionPooling, ensuring efficient application of the Attention Pooling algorithm. This modular design enhances model adaptability, improves market data processing, and increases the quality of price movement analysis and forecasting, making it a valuable tool for algorithmic trading.

The next stage of data processing is local node encoding. In our implementation, we use a previously developed adaptive feature smoothing module. Node-Adaptive Feature Smoothing (NAFS) generates more informative node embeddings by accounting for both graph structure and individual node characteristics. This approach assumes that different nodes may require different smoothing levels. This enables adaptive processing of each node in the context of its neighborhood. NAFS applies a combined low- and high-order smoothing approach, effectively capturing both local and global dependencies within the graph.

NAFS employs an ensemble feature aggregation method. This approach enhances model robustness against noise and improving encoding reliability. Key benefits of the NAFS module include:

Flexible data filtering, highlighting the most significant features while eliminating noise.
Optimization of computational costs, critical when analyzing large graphs and high-frequency market data.
Adaptability to changing conditions via dynamic adjustment of smoothing parameters.
Improved model accuracy through a balanced combination of detailed analysis and high generalization capability.

The final core component of GSM++ is the hybrid encoder. The framework authors propose combining a Mamba module with a Transformer. In our implementation, we follow this concept. But we further refine it by replacing Mamba with Chimera and Transformer with Hidformer.

Chimera employs two-dimensional state-space models (2D-SSM), effectively modeling dependencies along both the temporal axis and an additional dimension related to graph topology. This approach significantly expands the ability to analyze complex market relationships. Chimera's advantages include:

Two-dimensional dependency encoding, enhancing the detection of hidden market patterns and forecasting accuracy.
Increased model expressiveness, enabling deeper analysis of complex nonlinear asset relationships.
Adaptability to dynamic market changes, allowing the model to respond quickly to evolving conditions.

Hidformer features a dual-stream architecture that, unlike the classical Transformer, separates the processing of input data into two paths: one encoder analyzes temporal dependencies, while the other processes frequency components of market data. This design enables more accurate modeling of market dynamics. Hidformer's main advantages are:

Separation of temporal and frequency analysis, improving the precision of market trend forecasts.
Use of recursive attention in the temporal encoder and linear attention in the frequency encoder, reducing computational complexity and improving efficiency.

Thus, integrating Chimera and Hidformer within GSM++ has the potential to achieve high dependency encoding accuracy, minimize market noise influence, and enhance the reliability of analytical forecasts.

SSM Module Adjustment

It is worth noting that during testing of the model built using the Chimera framework, we observed extended position-holding durations. At that time, it was hypothesized that the model was capturing only long-term trends while ignoring short-term fluctuations. To address this limitation, we decided to slightly modify the previously implemented object and add an additional internal two-dimensional state-space model. The updated algorithms are implemented in the CNeuronChimeraPlus object, whose structure is outlined below.

class CNeuronChimeraPlus    :  public CNeuronChimera
  {
protected:
   CNeuron2DSSMOCL    cSSMPlus;
   CLayer             cDiscretizationPlus;
   //---
   virtual bool      feedForward(CNeuronBaseOCL *NeuronOCL) override;
   //---
   virtual bool      calcInputGradients(CNeuronBaseOCL *NeuronOCL) override;
   virtual bool      updateInputWeights(CNeuronBaseOCL *NeuronOCL) override;

public:
                     CNeuronChimeraPlus(void) {};
                    ~CNeuronChimeraPlus(void) {};
   //---
   virtual bool      Init(uint numOutputs, uint myIndex, COpenCLMy *open_cl,
                          uint window_in, uint window_out, uint units_in, uint units_out,
                          ENUM_OPTIMIZATION optimization_type, uint batch);
   //---
   virtual int       Type(void) override  const   {  return defNeuronChimeraPlus; }
   //---
   virtual bool      Save(int const file_handle) override;
   virtual bool      Load(int const file_handle) override;
   //---
   virtual bool      WeightsUpdate(CNeuronBaseOCL *source, float tau);
   virtual void      SetOpenCL(COpenCLMy *obj) override;
   //---
   virtual bool      Clear(void) override;
  };

As can be seen from the structure of the new object, we did not completely rewrite the previously created CNeuronChimera object. On the contrary, it was used as a parent class, allowing us to inherit all previously implemented functionality. However, adding a third 2D-SSM module along with the corresponding data projection block necessitates overriding the usual set of virtual methods. Initialization of both newly declared and inherited objects occurs in the Init method, whose parameter structure is fully inherited from the corresponding method of the parent class.

bool CNeuronChimeraPlus::Init(uint numOutputs, uint myIndex, COpenCLMy *open_cl,
                              uint window_in, uint window_out, uint units_in, uint units_out,
                              ENUM_OPTIMIZATION optimization_type, uint batch)
  {
   if(!CNeuronChimera::Init(numOutputs, myIndex, open_cl, window_in, window_out, units_in,
                                                      units_out, optimization_type, batch))
      return false;

Within the method body, we begin by calling the corresponding method of the parent class, which already implements the mechanisms for parameter validation and initialization of inherited components.

After successfully executing the parent class method operations, we proceed to initialize the newly declared objects, which provide the extended functionality of the model. One of the key components added at this stage is the additional two-dimensional state-space (2D-SSM) module.

It is important to note that the parent class method has already initialized two 2D-SSM modules, each performing a specific task. One module operates within the specified result dimensions, providing standard encoding of spatial dependencies, while the second uses an extended feature space, enabling the capture of more complex and multi-level relationships between analysis elements.

To increase the model's generalization capability and improve the accuracy of market data processing, the additional 2D-SSM module differs from the existing ones by functioning within the specified feature space but with an extended projection along the temporal dimension. This architecture allows for more precise analysis of time series and spatially distributed market data.

   int index = 0;
   if(!cSSMPlus.Init(0, index, OpenCL, window_in, window_out, units_in, 2 * units_out, optimization, iBatch))
      return false;

Next, it is necessary to project the results into the designated subspace. Here, it should be noted that we cannot immediately perform projections along the temporal dimension. therefore, we need to assemble a small internal sequence of objects.

We begin by preparing a dynamic array and local variables to store pointers to the objects being created.

   CNeuronTransposeOCL *transp = NULL;
   CNeuronConvOCL      *conv = NULL;
   cDiscretizationPlus.Clear();
   cDiscretizationPlus.SetOpenCL(OpenCL);

First, we create a data transposition object, which allows us to transform the data into the required format.

   index++;
   transp = new CNeuronTransposeOCL();
   if(!transp ||
      !transp.Init(0, index, OpenCL, 2 * units_out, window_out, optimization, iBatch) ||
      !cDiscretizationPlus.Add(transp))
     {
      delete transp;
      return false;
     }

Next, we add a convolutional data projection layer along the specified temporal dimension.

   index++;
   conv = new CNeuronConvOCL();
   if(!conv ||
      !conv.Init(0, index, OpenCL, 2 * units_out, 2 * units_out, units_out, window_out, 1, optimization, iBatch) ||
      !cDiscretizationPlus.Add(conv))
     {
      delete conv;
      return false;
     }
   conv.SetActivationFunction(None);

Afterward, the data are returned to their original representation using another transposition object.

   index++;
   transp = new CNeuronTransposeOCL();
   if(!transp ||
      !transp.Init(0, index, OpenCL, window_out, units_out, optimization, iBatch) ||
      !cDiscretizationPlus.Add(transp))
     {
      delete transp;
      return false;
     }
   transp.SetActivationFunction((ENUM_ACTIVATION)conv.Activation());
//---
   return true;
  }

This completes the initialization of internal objects, and the method concludes by returning a logical result to the calling program.

Next, we need to implement the feed-forward pass algorithm in the feedForward method. It should be noted that the output of the corresponding parent class method includes data normalization. As you know, this operation alters the data distribution. Summing these normalized values with the non-normalized outputs of the added information stream could cause unpredictable bias toward one of the pathways. To prevent this, we completely rewrite the feedForward method.

bool CNeuronChimeraPlus::feedForward(CNeuronBaseOCL *NeuronOCL)
  {
   for(uint i = 0; i < caSSM.Size(); i++)
     {
      if(!caSSM[i].FeedForward(NeuronOCL))
         return false;
     }
   if(!cSSMPlus.FeedForward(NeuronOCL))
      return false;

The method receives a pointer to the input data object, which is immediately passed to the identically named methods of the internal state-space models. These models generate results in three different projections.

Next, using internal discretization objects, we bring the state-space model outputs into a comparable form.

   if(!cDiscretization.FeedForward(caSSM[1].AsObject()))
      return false;
   CNeuronBaseOCL *inp = NeuronOCL;
   CNeuronBaseOCL *current = NULL;
   for(int i = 0; i < cDiscretizationPlus.Total(); i++)
     {
      current = cDiscretizationPlus[i];
      if(!current ||
         !current.FeedForward(inp))
         return false;
      inp = current;
     }

Additionally, we obtain a projection of the input data along the residual connection pathway.

   inp = NeuronOCL;
   for(int i = 0; i < cResidual.Total(); i++)
     {
      current = cResidual[i];
      if(!current ||
         !current.FeedForward(inp))
         return false;
      inp = current;
     }

Finally, the four information streams are summed. In this case, normalization of values is implemented only at the final stage.

   inp = cDiscretizationPlus[-1];
   if(!SumAndNormilize(caSSM[0].getOutput(), cDiscretization.getOutput(), Output, 1, false, 0, 0, 0, 1) ||
      !SumAndNormilize(Output, inp.getOutput(), Output, 1, false, 0, 0, 0, 1) ||
      !SumAndNormilize(Output, current.getOutput(), Output, cDiscretization.GetFilters(), true, 0, 0, 0, 1))
      return false;
//---
   return true;
  }

We return the logical result of the performed operations to the calling program and complete the execution of the method.

The next stage of our work involves building backpropagation algorithms. They are implemented in two methods: calcInputGradients and updateInputWeights. The first handles the distribution of error gradients among participating objects. The second adjusts the trainable parameters of the models. At this stage, unlike in the feedforward method, we can use the functionality of the parent class.

The gradient distribution method receives a pointer to the same input data object, which must now be populated with error gradient values reflecting the influence of the input data on the model output.

bool CNeuronChimeraPlus::calcInputGradients(CNeuronBaseOCL *NeuronOCL)
  {
   if(!CNeuronChimera::calcInputGradients(NeuronOCL))
      return false;

Unlike the standard approach, the pointerэs validity is not checked; it is immediately passed to the parent class method of the same name. It already implements control points and the algorithm for distributing error gradients across the three inherited information streams (two 2D-SSMs and the residual connection pathway).

Next, the gradient is propagated only through the added pathway. First, the error gradient received from subsequent objects is adjusted by the derivative of the activation function of the last layer in the added discretization model.

   CNeuronBaseOCL *current = cDiscretizationPlus[-1];
   if(!current ||
      !DeActivation(current.getOutput(), current.getGradient(), Gradient, current.Activation()))
      return false;

The adjusted values are then passed through the discretization block in reverse. For this, we iterate backward through the elements and sequentially call the corresponding methods of the relevant objects.

   for(int i = cDiscretizationPlus.Total() - 2; i >= 0; i--)
     {
      current = cDiscretizationPlus[i];
      if(!current ||
         !current.calcHiddenGradients(cDiscretizationPlus[i + 1]))
         return false;
     }

The error gradient is then propagated through the two-dimensional state-space model.

   if(!cSSMPlus.calcHiddenGradients(current.AsObject()))
      return false;

Finally, the gradient is returned to the level of the input data. The input data object's buffer already contains the gradient propagated through the three inherited information streams. Therefore, in order to preserve the data, the pointer to the error gradient buffer is temporarily replaced to preserve the newly computed values.

   current = cResidual[0];
   CBufferFloat *temp = NeuronOCL.getGradient();
   if(!NeuronOCL.SetGradient(current.getGradient(), false) ||
      !NeuronOCL.calcHiddenGradients(cSSMPlus.AsObject()) ||
      !SumAndNormilize(temp, NeuronOCL.getGradient(), temp, 1, false, 0, 0, 0, 1) ||
      !NeuronOCL.SetGradient(temp, false))
      return false;
//---
   return true;
  }

Next, we propagate the error gradient from the two-dimensional state space model to the input data level. Then we sum the results obtained values with the previously accumulated ones.

We return the pointers to the data buffers to their original state.

This completes the review of the algorithms for the enhanced Chimera module. The full code of the CNeuronChimeraPlus class and all its methods can be found in the attached files.

Building the Hybrid Decoder

After constructing the upgraded Chimera module, we move on to developing the hybrid encoder. As mentioned earlier, in our implementation it contains the Chimera module and the Hidformer block. The Hidformer object receives data on the analyzed system state as input and generates a tensor of agent actions as output. And in this context, it would probably be more correct to refer to our new object as a hybrid decoder. The object structure is presented below.

class CNeuronHypridDecoder :  public CNeuronHidformer
  {
protected:
   CNeuronChimeraPlus   cChimera;
   //---
   virtual bool      feedForward(CNeuronBaseOCL *NeuronOCL) override;
   virtual bool      updateInputWeights(CNeuronBaseOCL *NeuronOCL) override;
   virtual bool      calcInputGradients(CNeuronBaseOCL *NeuronOCL) override;

public:
                     CNeuronHypridDecoder(void){};
                    ~CNeuronHypridDecoder(void){};
   //---
   virtual bool      Init(uint numOutputs, uint myIndex, COpenCLMy *open_cl,
                          uint window, uint window_key, uint units_count,
                          uint heads, uint layers, uint stack_size, uint nactions,
                          ENUM_OPTIMIZATION optimization_type, uint batch);
   //---
   virtual int       Type(void) override   const   {  return defNeuronHypridDecoder; }
   //---
   virtual bool      Save(int const file_handle) override;
   virtual bool      Load(int const file_handle) override;
   //---
   virtual bool      WeightsUpdate(CNeuronBaseOCL *source, float tau) override;
   virtual void      SetOpenCL(COpenCLMy *obj) override;
   //---
   virtual bool      Clear(void) override;
  };

The structure declares only one internal object — the modified Chimera module, whose algorithms were described above. The CNeuronHidformer object is used as the parent class, which avoids redundant duplication of functionality and allows efficient reuse of already implemented methods and structures without explicitly creating an additional instance inside the object. Nevertheless, we need to override the usual set of virtual methods.

The internal object is declared statically, so the constructor and destructor of the new class remain empty. The initialization of these declared and inherited objects is performed in the Init method.

bool CNeuronHypridDecoder::Init(uint numOutputs, uint myIndex, COpenCLMy *open_cl,
                                uint window, uint window_key, uint units_count,
                                uint heads, uint layers, uint stack_size, uint nactions,
                                ENUM_OPTIMIZATION optimization_type, uint batch)
  {
   if(!CNeuronHidformer::Init(numOutputs, myIndex, open_cl, window_key, window_key, nactions,
                             heads, layers, stack_size, nactions, optimization_type, batch))
      return false;

The method parameters include a set of constants that uniquely define the architecture of the object being created. It should be noted that the parameter structure is fully inherited from the identically named method of the parent class. However, when calling the parent class method, the received values are not passed in the same form. This is because the feedforward method of the parent class is intended to receive, not the raw data from the external program, but the outputs of the internal Chimera module. Accordingly, during the initialization of inherited parent class objects, the result dimensions of the internal module are specified as the input data. Here, the feature dimensionality is set to the value of the internal state vector. The sequence length matches the agent's action space. In other words, the Chimera module outputs a latent state tensor, where each row represents a token corresponding to a separate element of the agent's actions.

After successfully executing the parent class method, the identically named method of the Chimera module is called, specifying the input data dimensions and the desired result tensor dimensions.

   if(!cChimera.Init(0, 0, OpenCL, window, window_key, units_count, nactions, optimization, iBatch))
      return false;
//---
   return true;
  }

The method then returns a logical execution result, completing the operation.

You may have noticed that the algorithm of the object initialization method is quite simple. The same simplicity applies to the other methods. For example, the feedForward method receives a pointer to the input data object, which is immediately passed to the identically named Chimera module method.

bool CNeuronHypridDecoder::feedForward(CNeuronBaseOCL *NeuronOCL)
  {
   if(!cChimera.FeedForward(NeuronOCL))
      return false;
   return CNeuronHidformer::feedForward(cChimera.AsObject());
  }

The results are then passed to the identically named parent class method. The logical result is returned to the calling program. Then we complete the method.

The remaining methods of this class can be reviewed independently. Their full code can be found in the attachment.

Model Architecture

Having completed the construction of the individual building blocks of the GSM++ framework, we proceed to assembling the complete model architecture. In this case, we train a single model - the Actor. The architecture description is provided in the CreateDescriptions method.

bool CreateDescriptions(CArrayObj *&actor)
  {
//---
   CLayerDescription *descr;
//---
   if(!actor)
     {
      actor = new CArrayObj();
      if(!actor)
         return false;
     }

The method receives a pointer to a dynamic array for storing the sequence of objects describing the model architecture. We immediately verify the pointer validity. If necessary, we create a new object instance.

Next, we create descriptions for the input data layer. As usual, a fully connected layer of sufficient size is used.

//--- Actor
   actor.Clear();
//--- Input layer
   if(!(descr = new CLayerDescription()))
      return false;
   descr.type = defNeuronBaseOCL;
   int prev_count = descr.count = (HistoryBars * BarDescr);
   descr.activation = None;
   descr.optimization = ADAM;
   if(!actor.Add(descr))
     {
      delete descr;
      return false;
     }

The model receives raw data from the terminal. Initial preprocessing is done by the model. For this purpose, a batch normalization layer is applied to standardize the disparate values of the input data.

//--- layer 1
   if(!(descr = new CLayerDescription()))
      return false;
   descr.type = defNeuronBatchNormOCL;
   descr.count = prev_count;
   descr.batch = 1e4;
   descr.activation = None;
   descr.optimization = ADAM;
   if(!actor.Add(descr))
     {
      delete descr;
      return false;
     }

This is followed by the mixed tokenization module.

//--- layer 2
   if(!(descr = new CLayerDescription()))
      return false;
   descr.type = defNeuronMoT;
   descr.window = BarDescr;
   descr.count = HistoryBars;
   descr.batch = 1e4;
   descr.activation = None;
   descr.optimization = ADAM;
   if(!actor.Add(descr))
     {
      delete descr;
      return false;
     }

Next, the S3 module is used to learn the optimal token permutation method. It identifies the best element order considering their interdependencies and significance within the overall data structure.

//--- layer 3
   if(!(descr = new CLayerDescription()))
      return false;
   descr.type = defNeuronS3;
   descr.count = HistoryBars;
   descr.window = BarDescr;
   descr.activation = None;
   descr.optimization = ADAM;
   if(!actor.Add(descr))
     {
      delete descr;
      return false;
     }

The processed data are then passed to the local node encoder, implemented by the NAFS module.

//--- layer 4
   if(!(descr = new CLayerDescription()))
      return false;
   descr.type = defNeuronNAFS;
   descr.count = HistoryBars;
   descr.window = BarDescr;
   descr.window_out = BarDescr;
   descr.activation = None;
   descr.optimization = ADAM;
   if(!actor.Add(descr))
     {
      delete descr;
      return false;
     }

The Agent action tensor is generated by the hybrid decoder module, whose algorithms were described above.

//--- layer 5
   if(!(descr = new CLayerDescription()))
      return false;
   descr.type = defNeuronHypridDecoder;
//--- Windows
     {
      int temp[] = {BarDescr, 120, NActions};      //Window, Stack Size, N Actions
      if(ArrayCopy(descr.windows, temp) < int(temp.Size()))
         return false;
     }
   descr.count = HistoryBars;
   descr.window_out = 32;
   descr.step = 4;                                  // Heads
   descr.layers = 3;
   descr.batch = 1e4;
   descr.activation = None;
   descr.optimization = ADAM;
   if(!actor.Add(descr))
     {
      delete descr;
      return false;
     }

It is important to note that the architecture we developed for the Agent is focused solely on analyzing environmental states. However, this is insufficient for comprehensive risk assessment, as the model does not account for available assets and their impact on decision-making.

To solve this, the architecture is augmented with a risk management block, borrowed from previously considered models.

//--- layer 6
   if(!(descr = new CLayerDescription()))
      return false;
   descr.type = defNeuronMacroHFTvsRiskManager;
//--- Windows
     {
      int temp[] = {3, 15, NActions, AccountDescr}; //Window, Stack Size, N Actions, Account Description
      if(ArrayCopy(descr.windows, temp) < int(temp.Size()))
         return false;
     }
   descr.count = 10;
   descr.window_out = 16;
   descr.step = 4;                              // Heads
   descr.batch = 1e4;
   descr.activation = None;
   descr.optimization = ADAM;
   if(!actor.Add(descr))
     {
      delete descr;
      return false;
     }
//--- layer 7
   if(!(descr = new CLayerDescription()))
      return false;
   descr.type = defNeuronConvOCL;
   descr.count = NActions / 3;
   descr.window = 3;
   descr.step = 3;
   descr.window_out = 3;
   descr.activation = SIGMOID;
   descr.optimization = ADAM;
   if(!actor.Add(descr))
     {
      delete descr;
      return false;
     }
//---
   return true;
  }

After creating the architecture description, the method returns a logical execution result to the calling program.

The full code of the architecture description method is provided in the attachment. Attachment also contains training and testing scripts copied from previous work. They are available for independent review.

Testing

We have completed extensive work implementing our interpretation of the approaches proposed by the GSM framework authors. We have now reached a critical stage — evaluating the effectiveness of the implemented solutions using real historical data.

It should be noted that the final neural layers of the Agent closely replicate the architecture used in our Hidformer framework implementation. The same risk management module structure is applied, and the CNeuronHidformer object is used at the hybrid decoder output. This architectural similarity makes it reasonable to compare the performance of the new model with the Hidformer framework.

For a fair comparison, both models were trained on the same dataset previously used for Hidformer training. Recall that:

The training set consists of historical EURUSD M1 data for the entire 2024 calendar year.
All analyzed indicator parameters remain at default values, without additional optimization, eliminating external factor influence.
Testing of the trained model was conducted on January 2025 historical data, keeping all other parameters unchanged to ensure objective comparison.

The testing results are presented below.

Testing Result

During the test period, the model executed 15 trades, which is relatively low for high-frequency trading on the M1 timeframe. This figure is even below that achieved by the baseline Hidformer model. Only 7 trades were profitable, representing 46.67%, This is also lower than the baseline 62.07%. Here we see reduced accuracy of short positions. However, there was a slight decrease in loss size alongside a relative increase in profitable trade sizes.

If the baseline model’s ratio of average profitable to losing trades was 1.6, in the new model this ratio exceeds 4. This nearly doubled overall profit for the test period, with a corresponding increase in the profit factor. This suggests that the new architecture prioritizes loss minimization and profit maximization for successful trades. This may lead to more stable financial results over the long term. However, the short test period and small number of trades prevent conclusions about long-term model performance.

Conclusion

We have explored the GSM++ unified graph sequence processing framework, which combines advanced methods for analyzing market data. Its main advantage lies in the hybrid data representation, incorporating hierarchical tokenization, local node encoding, and global dependency encoding. This multi-level approach efficiently extracts significant patterns and forms highly informative embeddings, which are critical for forecasting financial time series.

In the practical part of this work, we implemented our own interpretation of the proposed approaches using MQL5. It is important to note that there are substantial differences between our implementation and the authors' original methods. Therefore, all testing results apply exclusively to the implemented solution.

The trained model demonstrated the ability to generate profit on out-of-sample data. Although not in the volumes we would like to see. This indicates the potential of the implemented approaches, but further work is required, including training on a more representative dataset, comprehensive testing, and optimization of the analyzed indicators and their parameters. The model identifies patterns in the training data rather than creating them.

References

Programs used in the article

#	Name	Type	Description
1	Research.mq5	Expert Advisor	Expert Advisor for collecting samples
2	ResearchRealORL.mq5	Expert Advisor	Expert Advisor for collecting samples using the Real-ORL method
3	Study.mq5	Expert Advisor	Model training Expert Advisor
4	Test.mq5	Expert Advisor	Model testing Expert Advisor
5	Trajectory.mqh	Class library	System state and model architecture description structure
6	NeuroNet.mqh	Class library	A library of classes for creating a neural network
7	NeuroNet.cl	Code Base	OpenCL program code

Translated from Russian by MetaQuotes Ltd.
Original article: https://www.mql5.com/ru/articles/17310

Attached files |

Download ZIP

MQL5.zip (2482.85 KB)

Warning: All rights to these materials are reserved by MetaQuotes Ltd. Copying or reprinting of these materials in whole or in part is prohibited.

This article was written by a user of the site and reflects their personal views. MetaQuotes Ltd is not responsible for the accuracy of the information presented, nor for any consequences resulting from the use of the solutions, strategies or recommendations described.