Discussion of article "Random Decision Forest in Reinforcement learning" - page 3

 
FxTrader562:

Thank you for sharing this very useful article.

I was trying to implement additional indicators to the code, but I am not an expert programmer and not good experience as how to use membership functions and hence, I couldn't get my head as how to add more indicators to be used along with the rules inside the OnInit() function. The code contains only RSI indicator and creates the BUY and SELL rules out of that. Can you please provide few more example codes of indicators like Moving average or MACD or stochastic OR SAR to be used in the code?

Especially, I want to know as how to create rules and add to entry conditions while comparing it with current price. The main problem with current code is that sometimes it holds loosing trades for long time while closing profitable trades quickly and so any advice on this will be appreciated. I think more filtering on exit logic needs to be done.

Also, I have one question if you can answer please:

Does the OPT file continuously updates in order to improve the entries and exits over time after long time by fine tuning the policy itself?

Or does the EA just uses the strategy tester to optimise the EA values and uses the same entry and exit values of which were profitable recently like regular optimised EA?

I mean like other Neural network EAs, does it fine tunes it's overall policy for trade entry and exit during the course of trading?

Hello. Please wait for the next article. There I will examine the work of other indicators as well as with a variety of agents. And without fuzzy logic.

 
Maxim Dmitrievsky:

Hello. Please wait for the next article. There I will examine the work of other indicators as well as with a variety of agents. And without fuzzy logic.

Thanks for your quick reply...I will wait..

If you don't mind can you please mention when are you planning for publishing the next article with the indicator implementation code...

 
FxTrader562:

Thanks for your quick reply...I will wait..

If you don't mind can you please mention when are you planning for publishing the next article with the indicator implementation code...

I think 1-2 weeks on russian, and then they will translate it

 

But even if the article will be in Russian, still it will be available for download and use in regular MT5. right?

By the way, I have few suggestions for improving the logic of the reward function. Since I am not 100% sure about the coding for now and hence, I am not messing up with the code now.But if you find my suggestions useful, please consider to implement those in the next release.

I want the reward function to reward the agent based on summation of rewards from 2 factors:

1.Net profit( Orderprofit()+OrderCommission()+OrderSwap()): NetProfit(NP) for a particular Order close Reward(RW1) will be dealt by a Reward Factor(RF1) which value will be anything greater than 1 so that bigger the trade Profit bigger the reward will be and bigger the loss bigger the negative reward will be.

RW1=NP*MathPow(RF1,2);  For example, RF1 value can be 1.1 or 1.2 or more

I mean RW1 for a particular order close should be multiplication of Net Profit(Positive or negative profit) along with square of the Reward Factor(RF1)

2.Number of consecutive Profits or losses(NCount): Another Reward(RW2) should be given based on consecutive Profits or losses by Reward Factor(RF2) and number of consecutive losses Or profits(NCount)

Before giving a reward the EA should check the last Order profit or loss for that particular symbol and if it is a net loss considering the OrderCommission and OrderSwap, then more negative reward will be given and if order is closed in profit, then more Positive Reward will be given. 

RW2=MathPow(NCount,RF2);  RF2 value can be 1.1 or 1.2 or more

For example, if the current closed order is positive and previous order is negative, then NCount=1-1=0 and RW2 will be zero. Or  if the current closed order is positive and previous order is positive, then NCount=1+1=2 and so RW2=(2)^(1.2) cosidering the value of RF2=1.2

Now the Net Reward(RW) for a particular Order close should be RW=RW1+RW2

If you can implement the code in the next version of the EA, then it will be great or if you can just tell me where will I add this code, then I will try to do it myself.

The best part which I think will be if we can declare RF1 and RF2 as global variable for optimisation by the EA so that it will find out the best combination of RF1 and RF2 while forward testing the EA

 

Hi Maxim Dmitrievsky,

Sorry to put so many questions...because I found this article very useful while studying and implementing machine learning to mql5....Of course, I am waiting for your next article..

My question is how to train the agent to develop a policy in order to maximise the profit, NOT the number of profitable trades?

I studied the updateReward() and updatePolicy() code and it seems to me to optimise only the number of profitable trades ignoring the profit of each trade and ignoring the whether the account balance is growing or not.

So can you please through show some light as how to integrate the profit amounts to the reward function and is there a way to do that?

I tried to implement my own code proposed above, but probably it doesn't work(though there was no coding error) or I don't know how to implement. Or probably, I did not understand completely what exactly the updateReward() and updatePolicy() function does. I will really appreciate for your valuable contribution if you can explain a little more about the code inside the two functions and how and what exactly the RDFPolicyMatrix stores and how the values are used during next trade entries

Thanks in advance.

 

I can see the best results of the EA immediately after optimisation and as the agent trains itself, the results become worse day by day. So I was wondering if it is possible to launch the optimiser from the code after every loss. I mean instead of updating the reward for every loss, the EA should optimise itself for last couple of days of data till today. 

If either the author or anyone else knows how to implement this, then kindly share it.

 
Maxim Dmitrievsky:
It's not more about trees but training multiple forests on the same data yes, it does. Because the construction process is randomised and the results may vary. I was surprised that an ensemble of forests gives a noticeable improvement, i.e. train several forests on the same data and average the results. (5-15 pieces) You can make different settings too. Even better results should be in case of bousting, but I haven't got there yet.

Thanks for the article, it is informative and there is something to take (at least for me).

As I understand, fuzzy logic was taken as an example for the article. There are no prohibitions to directly get the value of ts (I have implemented it, I didn't notice any special differences in efficiency - forest completely substitutes fuzzy logic.). It can be considered as another way of rewarding the object. As it seems to me (I can't confirm with the program), increasing the number of optimised membership functions will not give a gain, already now the forest clogs the fuzzy logic. I tried averaging the results of forests (including fuzzy as in the article), the result improved, after neural networks from AlgLIB I was surprised by the speed of learning on values for several years. To set obvious differences I used the second form of forest creation with explicit indication of the use of different number of indicators (well, to play with the composition of indicators is not even discussed):

CDForest::DFBuildRandomDecisionForestX1(RDFpolisyMatrix,numberOfsamples,iNeuronEntra,iNeuronSal,number_of_trees,7,regularization,RDFinfo,RDF[0],RDF_report);
CDForest::DFBuildRandomDecisionForestX1(RDFpolisyMatrix,numberOfsamples,iNeuronEntra,iNeuronSal,number_of_trees,5,regularization,RDFinfo,RDF[0],RDF_report);

Can you tell me what other forms of reward can be tried? I'm interested in creating an ensemble of forests with different forms of reward, by the way, is the ensemble called averaging? Or a special formula for combining the result? I don't think AlgLIB has ensembles of forests?

It may be useful to someone, it is inconvenient to store data in a heap of files when there are several forests, so I decided to do it this way:

StructToHiden strHider;
   strHider.iNeuronEntra=iNeuronEntra;
   strHider.iNeuronSal=iNeuronSal;
   strHider.iTrees=number_of_trees;
   DataTekOpt=TimeLocal();
   strHider.DataTekOpt=DataTekOpt;
   
   int filehnd=FileOpen(NameFile+"_fuzzy.txt",FILE_WRITE|FILE_BIN|FILE_COMMON);
      strHider.iBufsize=RDFfuzzy.m_bufsize;
      FileWriteStruct(filehnd, strHider);//record file header
      FileWriteArray(filehnd,RDFfuzzy.m_trees); //tree record
      FileClose(filehnd);


int filehnd=FileOpen(NameFile+"_fuzzy.txt",FILE_READ|FILE_BIN|FILE_COMMON);
      FileReadStruct(filehnd,strHider);//read the file header
      RDFfuzzy.m_nvars=strHider.iNeuronEntra;
      RDFfuzzy.m_nclasses=strHider.iNeuronSal;
      RDFfuzzy.m_ntrees=strHider.iTrees;
      RDFfuzzy.m_bufsize=strHider.iBufsize;
      DataTekOpt=strHider.DataTekOpt;
      FileReadArray(filehnd,RDFfuzzy.m_trees);//read the tree
      FileClose(filehnd);
Structure is described by any structure, as its length is fixed, it turns out to store both it and forest in a file (structure is necessarily in the beginning). One forest - one file.

Thanks again for the article, thanks to it I started to study AlgLIB seriously.
 
mov:

Thanks for the article, informative and in application there is a lot to take in (at least for me).

As I understand, fuzzy logic was taken as an example for the article. There are no prohibitions to get ts value directly (I have implemented it, I didn't notice any special differences in efficiency - forest completely substitutes fuzzy logic). It can be considered as another way of rewarding the object. As it seems to me (I can't confirm with the program), increasing the number of optimised membership functions will not give a gain, already now the forest clogs the fuzzy logic. I tried averaging the results of forests (including fuzzy as in the article), the result improved, after neural networks from AlgLIB I was surprised by the speed of learning on values for several years. To set obvious differences I used the second form of forest creation with explicit indication of the use of different number of indicators (well, to play with the composition of indicators is not even discussed):

Can you tell me what other forms of reward can be tried? I'm interested in creating an ensemble of forests with different forms of reward, by the way, is the ensemble called averaging? Or a special formula for combining the result? I don't think AlgLIB has ensembles of forests?

It may be useful to someone, it is inconvenient to store data in a heap of files when there are several scaffolds, so I decided to do it this way:

The structure is described by any structure, as its length is fixed, it turns out to store both it and the forest in the file (the structure is necessarily in the beginning). One forest - one file.

Thanks again for the article, thanks to it I started to study AlgLIB seriously.

Yes, fuzzy logic as an example. I've already given up on it myself, because after various experiments it turned out that it doesn't make much sense to use it in this form.

You can try to reward in the form of current Sharpe Ratio or R^2 on deals. Or you can evaluate not the results of trades at all, but, for example, market conditions - if you bought and the trend has been growing for some time, the signal is suitable. Something like that.

By ensemble I meant simple averaging of results, yes. But for each forest you can set your own predictors and / or targets.

I'm planning to write an additional article just with the ensemble of agents + some more pluses, in the form of a class. I'll probably finish it soon.

One more important point not touched in this article - estimation of forest classification errors on training and test samples (oob), this point will also be described.

Thanks for the code samples, there is also an option of saving through serialisation

 
Maxim Dmitrievsky: + some more perks, in the form of a class. I will probably finish it soon.

Please, if possible, consider as an example the function of training by history, for example, when launching an Expert Advisor (trade emulation), without using a strategy tester. I tried to do it myself, but I can see that my hands are crooked, it works, but it is much inferior in the efficiency of training to training in the strategy tester.

 
mov:

Please, if possible, consider as an example the function of training by history, for example, when launching an Expert Advisor (trade emulation), without using the strategy tester. I tried to do it myself, but I can see that my hands are crooked, it works, but it is much inferior in the efficiency of training to training in the strategy tester.

Yes, a virtual tester is also in the plans, but for now we need to refine other aspects, for example, automatic selection and reduction of predictors are the most important now, so that the model would not be so much retrained on history.