Discussion of article "Applying Monte Carlo method in reinforcement learning"

To add comments, please log in or register
MetaQuotes Software Corp.
Moderator
204874
MetaQuotes Software Corp.  

New article Applying Monte Carlo method in reinforcement learning has been published:

In the article, we will apply Reinforcement learning to develop self-learning Expert Advisors. In the previous article, we considered the Random Decision Forest algorithm and wrote a simple self-learning EA based on Reinforcement learning. The main advantages of such an approach (trading algorithm development simplicity and high "training" speed) were outlined. Reinforcement learning (RL) is easily incorporated into any trading EA and speeds up its optimization.

After stopping the optimization, simply enable the single test mode (since the best model is written to the file and only that model is to be uploaded):


Let's scroll the history for two months back and see how the model works for the full four months:


We can see that the resulting model lasted another month (almost the entire September), while breaking down in August.

Author: Maxim Dmitrievsky

jaffer wilson
1158
jaffer wilson  
MetaQuotes Software Corp.:

New article Applying Monte Carlo method in reinforcement learning has been published:

Author: Maxim Dmitrievsky

Thank you. Is it possible to make the training using the GPU instead of CPU?

Maxim Dmitrievsky
28579
Maxim Dmitrievsky  
jaffer wilson:

Thank you. Is it possible to make the training using the GPU instead of CPU?

Yes, if you rewrite all logic (RF include) on open cl kernels :) also random forest has worst gpu feasibility and parallelism 

Mark Flint
2172
Mark Flint  

Thank you Maxim.

I have been playing with the code and introduced different types of data for the feature input vectors.  I have tried Close, Open, High, Low, Price Typical, Tick Volumes and derivatives of these.

I was having problems with the Optimiser complaining about "some error after the pass completed", but I finally managed to track this down: the optimiser errors occur if the input vector data has any zero values.

Where I was building a derivative, e.g. Close[1]-Close[2], sometimes the close values are the same giving a derivative of zero. For these types of input vector values I found the simplest fix was to add a constant, say 1000, to all vector values.  This cured the optimiser errors and yet allowed the RDF to function.

I have also noticed the unintended consequence of running the same tests over and over, the amount of curve fitting increases for the period tested.  Sometimes it is better to delete the recorded RDF files and run the optimisation again.

I am still experimenting and have more ideas for other types of feature. 


Maxim Dmitrievsky
28579
Maxim Dmitrievsky  
Mark Flint:

Thank you Maxim.

I have been playing with the code and introduced different types of data for the feature input vectors.  I have tried Close, Open, High, Low, Price Typical, Tick Volumes and derivatives of these.

I was having problems with the Optimiser complaining about "some error after the pass completed", but I finally managed to track this down: the optimiser errors occur if the input vector data has any zero values.

Where I was building a derivative, e.g. Close[1]-Close[2], sometimes the close values are the same giving a derivative of zero. For these types of input vector values I found the simplest fix was to add a constant, say 1000, to all vector values.  This cured the optimiser errors and yet allowed the RDF to function.

I have also noticed the unintended consequence of running the same tests over and over, the amount of curve fitting increases for the period tested.  Sometimes it is better to delete the recorded RDF files and run the optimisation again.

I am still experimenting and have more ideas for other types of feature. 


Hi, Mark. Depends of feature selection algorithm (in case of this article it's 1 feature/another feature (using price returns) in "recursive elimination func"), so if you have "0" in divider this "some error" can occur if you are using cl[1]-cl[2].  

Yes, different optimizer runs can differ, because its used a random sampling, also RDF random algorithm. To fix this you can use MathSrand(number_of_passes) in expert OnInint() func, or another fixed number. 

To add comments, please log in or register