Machine learning in trading: theory, models, practice and algo-trading - page 952

 
Dr. Trader:

For the last file I had this with the tree :

2016, training


y_pred

y_true-101
-113392388444472
010803767146029
17413376787415


2015, test:


y_pred

y_true-101
-19552392625429
011495721317509
18581403776835

when predicting "-1": -1 will actually occur slightly more often than 1. But 0 will be the most common, and it will probably all end in losses. Similarly for the "1" class.


The trouble with the tree came out. Genetics chose the tree parameter cp = 0, and this gives the tree permission for a bunch of branches. It was unfortunate, we should have restricted this parameter to some small non-zero value.

Is it possible to represent the figure in terms of probabilities, like before? Maybe there are more significant branches on the test data?

Dr. Trader:

I don't think there are enough predictors in the data to classify "0". We need some indicators of flat, for example.

In general it is bad with the tree. SanSanych's timber is much better.


Bad model settings, and as a consequence - retraining.

He took only one file 2016 (by the way in 2015 there was 1 less predictor - I corrected it, I can re-download), and this year 2016 was trending up!

The tree clings to the data from the upper TFs, and there are not enough statistics and because of that there may be a kazoon in the history, when the global motion vector changes (2015 is up, and 2016 is down) or there is a total flat (2017).

As for the flat, we have targets that work for market entry from the flat and for reversal, may we try to separate them in some way?

The flat is well identified by predictors like Levl, the only problem is that the tree cannot connect them together, at least at one TF.

 

You made a profitable robot there :)

Entry into the long will be only when predicting "1" (blue), of which in >90% will be a profit (green).
Entering short - only when predicting "-1" (in red), of which again >90% will be profit (in green).
Predictions of "0" mean not to open new positions and wait for a better time, so it does not matter at all what the actual accuracy is there before a prediction of this class.

But it is better, for example, to train the forest on the file of 2015, and check on the file of 2016. There is only one column missing in 2015, it should be removed from 2016 not to confuse rattle.

 
Aleksey Vyazmikin:

Can the pattern be represented as probabilities, like before?

No, this is a different mode of tree learning, suitable only for 2 classes. Or for regression.

 
SanSanych Fomenko:

What coolness is there - retraining and nothing else, he has not a single predictor that would relate to his target variable - all noise. Moreover, he sits in rattle, and instead of checking for noise, he posts files with garbage here.

Yes, targets have no clear relationship to predictors, they only show the financial result of entering the market at a particular point in time.

Do you think the result will be better if the logic of entries is tied to indicators of some of predictors, i.e. if we enter at МА crossing, we determine the result (1 or -1) and provide information about the fact of MA crossing into predictors?

 
Aleksey Vyazmikin:

The tree clings to the data from the upper TF, but there are not enough statistics on them and because of this there may be kazun on the history, when the global motion vector changes (2015 up and 2016 down) or there is a total flat (2017).

As for the flat, we have targets that work for market entry from the flat and for reversal, may we try to separate them in some way?

The flat is well identified by predictors like Levl, the only problem is that the tree cannot connect them together, at least on one TF.

It is clear that we already have different indicators of the flat, but the tree cannot connect them together. Then maybe that's it, this is the limit of the tree's possibilities.

My results yesterday were almost the same in terms of accuracy, but with a smaller number of entries into trades. What I got today is not much better. Something has gone wrong and I will consider what settings may be corrected.

 
Aleksey Vyazmikin:

Yes, the targets are not clearly linked to the predictors, they only show the financial result of entering the market at a particular point in time.

Do you think the result will be better if the logic of entries is connected with indicators of some of predictors, i.e. if we enter at MA crossing, we determine the result (1 or -1) and give information about the fact of MA crossing into predictors?

And that's my opinion: Incoming litter is outcoming litter! These are the first lines in the statistics textbooks.

 
Dr. Trader:

You made a profitable robot there :)

Entry to long - only when predicting "1" (in blue), of which in >90% will be profit (in green).
Entering short - only when predicting "-1" (in red), of which again >90% will be profit (green).
Predictions "0" means not to open new positions and wait for better timing, so it does not really matter what the actual accuracy is there before a prediction of this class.

But it is better, for example, to train the forest on the file of 2015, and check on the file of 2016. There is only one column missing in 2015, it should be removed from 2016 not to confuse rattle.

I didn't build anything - I took a ready file and built randomForest, but I was too lazy to divide it into two files. Alexey did it for me and showed a killer result, which completely covers my "achievements".

 
Dr. Trader:

Oh, I see, there are already various flat indicators, but the tree does not know how to relate them to each other. Then that's probably all, this is the limit for the capabilities of the tree.

I had almost the same results yesterday in terms of accuracy, but with a smaller number of entries into the transaction. What I got today is not much better. Something has gone wrong and I will consider what settings may be corrected.

Yes, I think that we need such a tree that can be helped - to show the probable relations between the predictors and to set conditions for probable comparisons for decision making.

Here's how to explain to the tree that the global trend is up and down? I can, of course, put the same dummy, draw a channel, make percentages, i.e. clearly indicate where the trend vector is directed, but the tree can simply ignore this predictor, while it, in my opinion, should divide the entire group into at least two according to the vector of the global trend.

I don't know, maybe the sample should be adjusted (divided) to the situations, trained on it, and then the Expert Advisor should forcibly identify the same global trend and listen to one or the other tree depending on the vector.

 
SanSanych Fomenko:

What's my point: garbage in, garbage out! These are the first lines in the statistical textbooks.

It's not about garbage - the input is actually a set of probabilities of the event outcome, these probabilities are influenced by predictors, and the output is a result of the fact that there are many different and independent events, although the outcome may be the same. I'll think about a clear input and removing all variants with no signal to input - it will be interesting to see the result. Truth is I still can't hear feedback from participants here, is it necessary to make explicit input marking in predictors, if different input strategies are used?

 

We're all looking for entry points, but maybe try to look for a flat?

Maybe someone has an indicator/script for determining the flats on the history?

I think we can take a regression channel with a range of 100, shift it on each bar and if the slope is greater/lower than X, then consider the area described by the channel a flat. What do you think?

Reason: