if(ts<0.4 && CheckMoneyForTrade(_Symbol,lots,ORDER_TYPE_BUY)) { if(OrderSend(Symbol(),OP_BUY,lots,SymbolInfoDouble(_Symbol,SYMBOL_ASK),0,0,0,NULL,OrderMagic,Green)) { updatePolicy(ts); }; }
You can replace it with this
if((ts<0.4) && (OrderSend(Symbol(),OP_BUY,lots,SymbolInfoDouble(_Symbol,SYMBOL_ASK),0,0,0,NULL,OrderMagic,INT_MIN) > 0)) updatePolicy(ts);
By the way, your variant of the if(OrderSend) check will always work. Because OrderSend does not return zero in case of an error.
Unfortunately, this is exactly what we have to do. Check out this article, it has an interesting way of "passing" data to the Agents.

- 2018.02.27
- Aleksey Zinovik
- www.mql5.com
Unfortunately, this is exactly what we have to do. Check out this article, it has an interesting way of "passing" data to Agents.
Yes, I saw it, it's a great article. I decided not to do anything fancy, because the performance is fine as it is, I didn't need a lot of passes.
Respect to the author for a great example of ML implementation using native MQL tools and without any crutches!
My only IMHO opinion is about positioning this example as reinforcement learning.
Firstly, it seems to me that reinforcement should have a dosedependent effect on the agent's memory, i.e. in this case on RDF, but here it just filters the samples of the training sample and it is practically not much different from preparing data for training with a teacher.
Secondly, the training itself is not streaming, continuous, but one-time, because after each pass of the tester, during the optimisation process, the whole system is trained again, not to mention real-time.
Respect to the author for a great example of ML implementation, using native MQL tools and without any crutches!
My only IMHO, regarding the positioning of this example as reinforcement learning.
Firstly, it seems to me that reinforcement should dosedependently affect the agent's memory, i.e. in this case RDF, but here it just filters the samples of the training sample and it is practically not much different from preparing data for training with a teacher.
Secondly, the training itself is not streaming, continuous, but ad hoc, since after each pass of the tester, during the optimisation process, the whole system is trained again, I'm not talking about realtime.
That's right, this is a simple coarse-grained example, more aimed at scaffolding familiarity than reinforcement learning. I think it's worth developing the topic further and looking at other ways of reinforcement.
Informative, thank you!
Firstly, it seems to me that reinforcement should dose the agent's memory, i.e. in this case RDF, but here it is just filtering the training samples and it is almost little different from training data for learning with a teacher.
Secondly, the training itself is not streaming, continuous, but one-time, because after each pass of the tester, in the process of optimisation, the whole system is trained again, I'm not talking about real-time.
I would like to add a little bit to your answer. These 2 points are closely related. This version is a variation on the theme of experience replay. That would not constantly update the table and do not retrain the forest (which would be more resource-intensive), and retrain only at the end on a new experience. I.e. this is a completely legitimate variant of reinforcement and is used, for example, in DeepMind for atari games, but there everything is more complicated and mini-batches are used to combat overfit.
In order to make learning streaming, it would be good to have a NS with additional learning, I am looking towards Bayesian networks. If you or anyone has examples of such networks on pluses - I would be grateful :)
Again, if there is no possibility for the approximator to retrain - then for a real-time update I would have to drag the whole matrix of past states, and just retraining on the whole sample every time is not a very elegant and slow solution (with increasing number of states and features)
Also, I'd like to combine Q-phrase and NN with mql, but I don't understand why to use it and cycle with state-to-state transition probabilities, when I can limit myself to the usual temporal difference estimation, as in this paper. Therefore, actor-critic is more appropriate here.
I want to add a little bit to the answer. These 2 points are closely related. This version is a variation on the theme of experience replay. That would not constantly update the table and do not retrain the forest (which would be more resource-intensive), and retrain only at the end on a new experience. I.e. this is a completely legitimate variant of reenforcing and is used, for example, in DeepMind for atari games, but there everything is more complicated and mini-batches are used to combat overfitting.
In order to make learning streaming, it would be good to have a NS with additional learning, I am looking towards Bayesian networks. If you or someone else has examples of such networks on pluses - I would be grateful :)
Again, if there is no possibility for the approximator to retrain - then for a real-time update you will have to drag the whole matrix of past states, and just retraining on the whole sample every time is not a very elegant and slow solution (with increasing number of states and features)
Also, I'd like to combine Q-phrase and NN with mql, but I don't understand why to use it and cycle with state-to-state transition probabilities, when I can limit myself to the usual temporal difference estimation, as in this paper. So, actor-critic is more on topic here.
I don't argue, perhaps the variant with reproduction of experience, in the form of batch learning, is a legitimate implementation of RL, but it is also a demonstration of one of its main disadvantages - long term credit assignment. Since the reward (reinforcement) in it occurs with a lag, which may be inefficient.
At the same time, another well-known disadvantage in the form of violation of the balance - exploration vs expluatation, because until the end of the passage, all the experience can only be accumulated and can not be used in any way in the current operation session. Therefore, IMHO, it is likely that in RL without streaming learning, nothing good is likely to happen.
Regarding new examples of networks, if we are talking about P-net, which I mentioned here on the forum, it is still NDA, so I can not share the source code. Technically, there are of course its pros and cons, from the positive - high speed of learning, working with different data objects, additional training in the future, perhaps LSTM.
At the moment there is a library for Python and native MT4 EA generator, all at the test stage. For direct coding there is a variant of using python interpreter built into the engine, you can work with it through code blocks passed directly from EA through named channels.
Regarding pure MQL implementation of RL, I don't know, intuitively it seems that it should lie in dynamic construction of decision trees (policy trees) or manipulations with arrays of weights between MLPs, maybe modifications on the basis of corresponding classes from Alglib....
Well done, Mikhail! You're 20% of the way to success. That's praise, the other 3% have passed. And you are going in the right direction.
Many people add 84 more indicators to an Expert Advisor with 37 indicators and think "now it will definitely work". Naive.
And now to the point. You have a dialogue only between machine and machine. In my opinion, you should add machine - human - machine.
I'll explain. In game theory, strategic games have the following attributes: a huge number of random, regular and psychological factors.
Psychological - it's like: there are good films, there are bad films, and there are Indian films. Do you get the idea?
If you are interested, will you ask what to do next and how to realise it? I'll answer - I don't know yet, I'm trying to find an answer myself.
But if you fixate only on cars, it will be like with 84 indicators: add or not add, you will go round in circles.

- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
New article Random Decision Forest in Reinforcement learning has been published:
Random Forest (RF) with the use of bagging is one of the most powerful machine learning methods, which is slightly inferior to gradient boosting. This article attempts to develop a self-learning trading system that makes decisions based on the experience gained from interaction with the market.
It can be said that a Random forest is a special case of bagging, where decision trees are used as the base family. At the same time, unlike the conventional decision tree construction method, pruning is not used. The method is intended for constructing a composition from large data samples as fast as possible. Each tree is built in a specific way. A feature (attribute) for constructing a tree node is not selected from the total number of features, but from their random subset. When constructing a regression model, the number of features is n/3. In case of classification, it is √n. All these are empirical recommendations and are called decorrelation: different sets of features fall into different trees, and the trees are trained on different samples.
Fig. 1. Scheme of random forest operation
Author: Maxim Dmitrievsky