Taking Neural Networks to the next level - page 35

 

Forum on trading, automated trading systems and testing trading strategies

Better NN EA development

Sergey Golubev, 2023.10.16 06:17

Neural networks made easy (Part 38): Self-Supervised Exploration via Disagreement

Neural networks made easy (Part 38): Self-Supervised Exploration via Disagreement

This algorithm is based on a self-learning method, where the agent uses information obtained during interaction with the environment to generate "intrinsic" rewards and update its strategy. The algorithm is based on the use of several agent models that interact with the environment and generate various predictions. If the models disagree, it is considered an "interesting" event and the agent is incentivized to explore that space of the environment. In this way, the algorithm incentivizes the agent to explore new areas of the environment and allows it to make more accurate predictions about future rewards.

 

Forum on trading, automated trading systems and testing trading strategies

Better NN EA development

Sergey Golubev, 2024.02.27 17:50

Data Science and Machine Learning (Part 20) : Algorithmic Trading Insights, A Faceoff Between LDA and PCA in MQL5 

LDA is a supervised generalization machine learning algorithm that aims to find a linear combination of features that best separates the classes in a dataset.

Just like the Principal Component Analysis(PCA), it is a dimension reduction algorithm, These algorithms are a common choice for dimensionality reduction, in this article we are going to compare them and observe in what situation each algorithm works best. We already discussed the PCA in the prior articles of this series, Let us commence by observing what the PCA algorithm is all about as we will discuss it mostly, finally we will compare their performances on a simple dataset and in the strategy tester, make sure you stick to the end for awesome data science stuff.

 

Neural networks made easy (Part 60): Online Decision Transformer (ODT)

The last two articles were devoted to the Decision Transformer method, which models action sequences in the context of an autoregressive model of desired rewards. As you might remember, according to the results of practical tests of two articles, the beginning of the testing period saw a fairly good increase in the profitability of the trained model results. Further on, the performance of the model decreases and a number of unprofitable transactions are observed, which leads to losses. The amount of losses received may exceed previously received profits.
Neural networks made easy (Part 60): Online Decision Transformer (ODT)
Neural networks made easy (Part 60): Online Decision Transformer (ODT)
  • www.mql5.com
The last two articles were devoted to the Decision Transformer method, which models action sequences in the context of an autoregressive model of desired rewards. In this article, we will look at another optimization algorithm for this method.
 

Forum on trading, automated trading systems and testing trading strategies

Better NN EA

Sergey Golubev, 2024.03.07 13:50

Neural networks made easy (Part 61): Optimism issue in offline reinforcement learning

Recently, offline reinforcement learning methods have become widespread, which promises many prospects in solving problems of varying complexity. However, one of the main problems that researchers face is the optimism that can arise while learning. The agent optimizes its strategy based on the data from the training set and gains confidence in its actions. But the training set is quite often not able to cover the entire variety of possible states and transitions of the environment. In a stochastic environment, such confidence turns out to be not entirely justified. In such cases, the agent's optimistic strategy may lead to increased risks and undesirable consequences.

 

Forum on trading, automated trading systems and testing trading strategies

Better NN EA development

Sergey Golubev, 2024.03.08 16:08

The Disagreement Problem: Diving Deeper into The Complexity Explainability in AI

The Disagreement Problem: Diving Deeper into The Complexity Explainability in AI

The disagreement is an open area of research in an interdisciplinary field known as Explainable Artificial Intelligence (XAI). Explainable Artificial Intelligence attempts to help us understand how our models are arriving at their decisions but unfortunately everything is easier said than done. 

We are all aware that machine learning models and available datasets are growing larger and more complex. As a matter of fact, the data scientists who develop machine learning algorithms cannot exactly explain their algorithm’s behaviour across all possible datasets.  Explainable Artificial Intelligence (XAI) helps us build trust in our models, explain their functionality and validate that the models are ready to be deployed in production; but as promising as that may sound, this article will show the reader why we cannot blindly trust any explanation we may get from any application of Explainable Artificial Intelligence technology.


 

Forum on trading, automated trading systems and testing trading strategies

Better NN EA

Sergey Golubev, 2024.04.18 16:47

Neural networks made easy (Part 66): Exploration problems in offline learning

Neural networks made easy (Part 66): Exploration problems in offline learning

As we move along the series of articles devoted to reinforcement learning methods, we are facing the question related to the balance between environmental exploration and exploitation of learned policies. We have previously considered various methods of stimulating the Agent to explore. But quite often, algorithms that demonstrate excellent results in online learning are not so effective offline. The problem is that for offline mode, information about the environment is limited by the size of the training dataset. Most often, the data selected for model training is narrowly targeted as it is collected within a small subspace of the task. This provides an even more limited idea of the environment. However, in order to find the optimal solution, the Agent needs the most complete understanding of the environment and its patterns. We have earlier noted that learning results often depend on the training dataset.

In this article, we will get acquainted with the Exploratory Data for Offline RL (ExORL) framework, which was presented in the paper "Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning". The results presented in that article demonstrate that the correct approach to data collection has a significant impact on the final learning outcomes. This impact is comparable to that of the choice of learning algorithm and model architecture.

 

Forum on trading, automated trading systems and testing trading strategies

Better NN EA

Sergey Golubev, 2024.04.18 16:54

Neural networks made easy (Part 67): Using past experience to solve new tasks

Reinforcement learning is built on maximizing the reward received from the environment during interaction with it. Obviously, the learning process requires constant interaction with the environment. However, situations are different. When solving some tasks, we can encounter various restrictions on such interaction with the environment. A possible solution for such situations is to use offline reinforcement learning algorithms. They allow you to train models on a limited archive of trajectories collected during preliminary interaction with the environment, while it was available.

 

Forum on trading, automated trading systems and testing trading strategies

Better NN EA

Sergey Golubev, 2024.04.28 17:35

Neural networks made easy (Part 68): Offline Preference-guided Policy Optimization 

Neural networks made easy (Part 68): Offline Preference-guided Policy Optimization

Reinforcement learning is a universal platform for learning optimal behavior policies in the environment under exploration. Policy optimality is achieved by maximizing the rewards received from the environment during interaction with it. But herein lies one of the main problems of this approach. The creation of an appropriate reward function often requires significant human effort. Additionally, rewards may be sparse and/or insufficient to express the true learning goal. As one of the options for solving this problem, the authors if the paper "Beyond Reward: Offline Preference-guided Policy Optimization" suggested the OPPO method (OPPO stands for the Offline Preference-guided Policy Optimization). The authors of the method suggest the replacement of the reward given by the environment with the preferences of the human annotator between two trajectories completed in the environment under exploration. Let's take a closer look at the proposed algorithm.

Reason: