New article Random Decision Forest in Reinforcement learning has been published:
Random Forest (RF) with the use of bagging is one of the most powerful machine learning methods, which is slightly inferior to gradient boosting. This article attempts to develop a self-learning trading system that makes decisions based on the experience gained from interaction with the market.
It can be said that a Random forest is a special case of bagging,
where decision trees are used as the base family. At the same time,
unlike the conventional decision tree construction method, pruning is
not used. The method is intended for constructing a composition from
large data samples as fast as possible. Each tree is built in a specific
way. A feature (attribute) for constructing a tree node is not selected
from the total number of features, but from their random subset. When
constructing a regression model, the number of features is n/3. In case
of classification, it is √n.
All these are empirical recommendations and are called decorrelation:
different sets of features fall into different trees, and the trees are
trained on different samples.
Fig. 1. Scheme of random forest operation
Author: Maxim Dmitrievsky
Thank you for sharing this very useful article.
I was trying to implement additional indicators to the code, but I am not an expert programmer and not good experience as how to use membership functions and hence, I couldn't get my head as how to add more indicators to be used along with the rules inside the OnInit() function. The code contains only RSI indicator and creates the BUY and SELL rules out of that. Can you please provide few more example codes of indicators like Moving average or MACD or stochastic OR SAR to be used in the code?
Especially, I want to know as how to create rules and add to entry conditions while comparing it with current price. The main problem with current code is that sometimes it holds loosing trades for long time while closing profitable trades quickly and so any advice on this will be appreciated. I think more filtering on exit logic needs to be done.
Also, I have one question if you can answer please:
Does the OPT file continuously updates in order to improve the entries and exits over time after long time by fine tuning the policy itself?
Or does the EA just uses the strategy tester to optimise the EA values and uses the same entry and exit values of which were profitable recently like regular optimised EA?
I mean like other Neural network EAs, does it fine tunes it's overall policy for trade entry and exit during the course of trading?
Hello. Please wait for the next article. There I will examine the work of other indicators as well as with a variety of agents. And without fuzzy logic.
Thanks for your quick reply...I will wait..
If you don't mind can you please mention when are you planning for publishing the next article with the indicator implementation code...
I think 1-2 weeks on russian, and then they will translate it
But even if the article will be in Russian, still it will be available for download and use in regular MT5. right?
By the way, I have few suggestions for improving the logic of the reward function. Since I am not 100% sure about the coding for now and hence, I am not messing up with the code now.But if you find my suggestions useful, please consider to implement those in the next release.
I want the reward function to reward the agent based on summation of rewards from 2 factors:
1.Net profit( Orderprofit()+OrderCommission()+OrderSwap()): NetProfit(NP) for a particular Order close Reward(RW1) will be dealt by a Reward Factor(RF1) which value will be anything greater than 1 so that bigger the trade Profit bigger the reward will be and bigger the loss bigger the negative reward will be.
RW1=NP*MathPow(RF1,2); For example, RF1 value can be 1.1 or 1.2 or more
I mean RW1 for a particular order close should be multiplication of Net Profit(Positive or negative profit) along with square of the Reward Factor(RF1)
2.Number of consecutive Profits or losses(NCount): Another Reward(RW2) should be given based on consecutive Profits or losses by Reward Factor(RF2) and number of consecutive losses Or profits(NCount)
Before giving a reward the EA should check the last Order profit or loss for that particular symbol and if it is a net loss considering the OrderCommission and OrderSwap, then more negative reward will be given and if order is closed in profit, then more Positive Reward will be given.
RW2=MathPow(NCount,RF2); RF2 value can be 1.1 or 1.2 or more
For example, if the current closed order is positive and previous order is negative, then NCount=1-1=0 and RW2 will be zero. Or if the current closed order is positive and previous order is positive, then NCount=1+1=2 and so RW2=(2)^(1.2) cosidering the value of RF2=1.2
Now the Net Reward(RW) for a particular Order close should be RW=RW1+RW2
If you can implement the code in the next version of the EA, then it will be great or if you can just tell me where will I add this code, then I will try to do it myself.
The best part which I think will be if we can declare RF1 and RF2 as global variable for optimisation by the EA so that it will find out the best combination of RF1 and RF2 while forward testing the EA
Hi Maxim Dmitrievsky,
Sorry to put so many questions...because I found this article very useful while studying and implementing machine learning to mql5....Of course, I am waiting for your next article..
My question is how to train the agent to develop a policy in order to maximise the profit, NOT the number of profitable trades?
I studied the updateReward() and updatePolicy() code and it seems to me to optimise only the number of profitable trades ignoring the profit of each trade and ignoring the whether the account balance is growing or not.
So can you please through show some light as how to integrate the profit amounts to the reward function and is there a way to do that?
I tried to implement my own code proposed above, but probably it doesn't work(though there was no coding error) or I don't know how to implement. Or probably, I did not understand completely what exactly the updateReward() and updatePolicy() function does. I will really appreciate for your valuable contribution if you can explain a little more about the code inside the two functions and how and what exactly the RDFPolicyMatrix stores and how the values are used during next trade entries
Thanks in advance.
I can see the best results of the EA immediately after optimisation and as the agent trains itself, the results become worse day by day. So I was wondering if it is possible to launch the optimiser from the code after every loss. I mean instead of updating the reward for every loss, the EA should optimise itself for last couple of days of data till today.
If either the author or anyone else knows how to implement this, then kindly share it.
Dear Maxim Dmitrievsky,
Can you please update if you have published your next article regarding implementation of Random Decision forest with different agents and without fuzzy logic which you mentioned previously?
Thank you very much