Discussion of article "Random Decision Forest in Reinforcement learning" - page 6

 
mov:

Good afternoon,

I am posting the results of some experiments (obtained on pure trees without fuzzy logic, I thought to attach them to a new article, but since the discussion of reward functions continues, I am posting them as information for reflection and discussion).

1.It seemed to me not quite right that let's say at SELL the random value is set on the whole interval 0..1, because we already know that sales are unprofitable

By limiting the ranges to opposite and uncertain values, the speed of learning increases many times. With 2-3 runs (I believe with a pass on random data) the quality of training as with 4-6 old (scatter is wide, because a lot of additional factors, but the efficiency has increased not even by tens of per cent).

2. In the initial implementation, I found it strange that the value obtained randomly is a reinforcing factor. This easily creates a situation where a strong trend has a lower reward.

The first attempt to get away from this

Idea: at 100 pips and above taken profit- 1, if less - evenly increasing (in this case from 0.61). Priper pdl for selling, for buying similarly with other levels. Theoretically for a stronger trend - higher reward. The results have improved, but slightly more than the statistical error. At the same time, the file with the tree for the same conditions has significantly decreased in size. Apparently, such a peculiar sorting of the results allowed to describe the rules more simply.

To test the ensemble of trees, I decided to load the estimation of one tree

and, out of habit, ran the training. What was my surprise, on a similar training with a coarsened reward function showed a significant improvement - on the trained plot, all other things being equal, the profit for 4 months exceeded the profit for 6 months of the old variant (I am operating with comparisons, because specific figures vary greatly from the training conditions, pair, curvature of coder's handles) and what is most interesting, the results on the control interval improved. Loading the evaluation function improved the prediction! Probably for a pro-statistician there is nothing new here and he can prove by formulas that it should be so, but for me it is a shock, as they say it is necessary to get used to it. This raises the question of further selection and evaluation of prediction functions.

I hope that the time I spent on the tests will help someone to reduce the time of their search at least (or give the opportunity to make new mistakes, which will be shared with us).

Interesting results, thanks. The 1st point did, yes, the speed of learning increases. In fact the class I'm working on now is effectively learning in 1 iteration already. I.e. it's a super efficient algorithm in terms of learning speed, but internal preprocessing is still worth organising so that the model doesn't overtrain too much.

I don't understand from the highlighted part, what does it mean to load the estimation of a single tree? Do you mean that you give only 0 and 1 as class labels? In that case we can assume that the variance of values within the model has decreased and the quality has increased as a consequence. Because the algorithm was originally designed for a large number of iterations, but in practice it turns out that it is not necessary and such a loading may be correct.

 
Igor Makanu:

and how realistic is it to teach this code, kindly provided by the author of the article, to the simplest patterns of 3-5 bars?

SZY: hmm, even under alcohol I'm writing like on Aliexpress to a Chinese seller ))))

Everything is real, you input patterns of 3-5 bars.

 
Maxim Dmitrievsky:

Everything is real, you feed patterns of 3-5 bars to the input

I'll be brief..... how?

 
Igor Makanu:

I'll be brief..... how?

by hand )

 
Maxim Dmitrievsky:

I don't understand what it means to load the evaluation of a single tree? Do you mean that you give only 0 and 1 as class labels?

Yes, the test was performed with one tree and labels 0 and 1. (On an ensemble of similar trees the result is even higher).
 
mov:
Yes, the test was carried out with one tree and labels 0 and 1. (On an ensemble of similar trees the result is even higher).
Sorry, naturally with one forest of trees (I mean always a forest, and on the everyday level I say tree, I will get rid of this habit).
 
mov:
Sorry, naturally with one forest of trees (I mean always a forest, and on a domestic level saying tree, I will get rid of this habit)

Yes, got it. But the forest can also be set to 1 tree or pull out its construction function. I don't know why it is necessary.

 

Initially I tried to increase the number of trees to 500 and 1000. But I noticed that more and more trees did not improve the results. But the internal thing I see exceeding 500 trees, the optimisation keeps crashing and not creating Mtrees text files.

Also, I was tested by increasing the number from 50 to 100 and noticed that the best results are in iterations between 20 and 25 iterations and anything more than that doesn't make sense.

But I have to agree that RSI period results with different combinations.

hnd1 = iRSI (_Symbol, 0,8, PRICE_CLOSE);

hnd2 = iRSI (_Symbol, 0,16, PRICE_CLOSE);

hnd3 = iRSI (_Symbol, 0,24, PRICE_CLOSE);


So I thought of adding another one to MTrees for future use. But later it was realised that EA stores the values of the last iteration. If so anyway, if we can change the period. I mean EA should be able to immediately switch to another RSI period if there is a loss.

Also, I am not well versed in fuzzy logic. So I was wondering if someone can post the full source code without fuzzy logic, then I would appreciate if you can't find the correct code for RDF without fuzzy logic and one example indicator.

I'm just curious to see the results of what happens to the results if we feed 20 to 30 indicator values as input to the agent and have the agent automatically train.

 
FxTrader562:

Initially I tried to increase the number of trees to 500 and 1000. But I noticed that more and more trees did not improve the results. But the internal thing I see exceeding 500 trees, the optimisation keeps crashing and not creating Mtrees text files.

Also, I was tested by increasing the number from 50 to 100 and noticed that the best results are in iterations between 20 and 25 iterations and nothing more than that makes sense.


I'm just curious to see the results of what happens to the results if we feed 20 to 30 indicator values as input to the agent and have the agent automatically train.

There are experimental results on the web, 100 trees is the best recognition, 20-50 if prediction is needed, I tried at 100, the prediction gets worse.

I tried 15-19 indicators on the input in the expectation that when the situation changes, the forest will select the best ones during training. Already at 10 and above the results stop growing. Note that when building the forest for each tree only half of the inputs are used (in this implementation of the forest). It seems that theoretically the root of the number of inputs (not half) is better for classification tasks (as they say), but I haven't tried it myself.

 
mov :

The network has the results of experiments, 100 trees - the best recognition, 20-50 - if prediction is necessary, I tried at 100, predictability is getting worse.

I tried 15-19 indicators at the entrance in the calculation that if the situation changes the forest will choose the best when training. Already at 10 and above the results of cease to grow. Note that when building a forest, only half of the inputs are used for each tree. It seems that theoretically for classification problems (as they say) is the root of the number of inputs (rather than half) is better, but he did not try.

Thank you for your reply.

However, I want to know what happens if we can all know (15 to 20 indicators) at once and NOT using only the few indicators for the agent, but we should use all indicators. And then, train the agent for the past 1 year so that the agent can develop the best policy using all indicators. I mean that we should determine the current state of the agent at every candle close with more indicator values.

Till now what I have noticed so far is that one loss is wiping out the series of small profits due to lack of proper exit conditions. So both entry and exit conditions need to be fine tuned.

Can you please provide me one example code of an indicator without fuzzy logic and where to put the indicator in the current implementation of the code?

I tried to add the indicators inside OnTick () function, but it did not help much. I am looking for a complete sample code of the current version of the EA without fuzzy logic.