How to predict the future of a decision tree - General

[Deleted] 2016.02.16 20:37 #71

Alexey Volchanskiy:
A related question to everyone in the discussion. Do you work with tick data? I have long moved away from bar analysis, working exclusively on DSP methods

I use M1-bar bids with ascs as well as Level2.

Alexey Volchanskiy 2016.02.16 20:40 #72

zaskok3:
I use M1-bar bids with asks as well as Level2.

Is L2 on MT5?

Alexey Burnakov 2016.02.16 20:42 #73

Vladimir Perervenko:

The article you are referring to is about regression. We are dealing with classification. Those are two big differences...

I still don't understand your question.

Good luck

Here, it doesn't matter what regression or classification. All the same. It's just an article specifically about regression.

Just to clarify: You have examples for training with what step are taken, one bar (i.e. inputs of each row of the data array) or n bars, so that there is a time lag between rows?

I'm not just being nerdy, and I certainly don't want to discredit your work (your articles help me).

Let me explain my point with a practical example, without snatching quotes from statistical studies:

in a decision tree you will have, say, m terminal nodes. Each node will contain cases similar in input vectors - a subspace of input values. So, if you have consecutive bar-shifted examples using inputs that look back a few bars (in the worst case, also hundreds of bars), there will be a bright autocorrelation between the nearby points, but at the same time, since we predict the future a few bars ahead (in the worst case, also hundreds of bars), the nearby outputs will be the same too. For example, the column of outputs will be formed by the sequences 0 0 0 0 0 0 0 00 0 1 1 1 1 1 1 1. So, dozens of identical outputs will fall into our terminal nodes, referring to adjacent - similar - inputs. You could say that there will be a redundancy of identical examples bunched up by time points, which will skew the distribution of responses in the most emphatic way. That is why there is a popular recommendation not to hold more than one position in the market, because the dependence effect of neighboring entries and exits is also present when training an EA in the terminal.

In this case there will be tough retraining, or rather forming statistics of non-dependent observations. That is, the most unpleasant thing that can be obtained when analyzing time series is the dependence of neighboring data vectors. If the data vectors are far away in time, it's OK. In this case, machine learning boils down to finding patterns that are invariant with respect to time.

And then, referring to the error matrix you give as an example in the article:

OOB confusion matrix:
          Reference
Prediction   -1    1 class.error
        -1 1066  280      0.2080
        1   254 1043      0.1958

Test set
Error rate: 19.97%

Confusion matrix:
          Reference
Prediction  -1   1 class.error
        -1 541 145      0.2114
        1  119 517      0.1871

I can only say that it is fantastic. ) The experiment has been conducted in error. You can never achieve such a steep error matrix on a sample with independent examples, and at the same time strictly separating the test dataset from the training dataset by time (look-ahead bias).

And the fact that the error matrix on the test set is also fantastic suggests that a sample was taken mixed with the training sample by time, in which similar examples are also "huddled". That is, this particular result says nothing about the ability of the constructed model to predict the market.

You may try to take a bit more data and test it using the tail(all_data, 1/3) logic and see how the number of observations in the cells of the matrix is aligned. You can even apply the chi-square criterion to see if the guessing has become almost random.

All that I wanted to convey to you, I have tried to do. Note, with good intentions)

Good luck! Alexey

Machine learning in trading: Bayesian regression - Has Obtaining a stationary BP

[Deleted] 2016.02.16 23:10 #74

Alexey Volchanskiy:
L2 is on MT5?

MT4. The source code has been floating around on the forum...

Alexey Volchanskiy 2016.02.17 01:30 #75

zaskok3:
MT4. The source code was leaked on the forum...

Friends and colleagues, I have a question.

How can you formulate an algorithm based on published trading data?

Alexey Volchanskiy 2016.02.17 01:32 #76

Alexey Volchanskiy:

Friends and colleagues, I have a question.

How can you formulate an algorithm based on published trading data?

I know I wrote it wrong - formulate, from the word formula)

[Deleted] 2016.02.17 07:56 #77

Alexey Volchanskiy:

How can you formulate an algorithm based on published trading data?

If you want to reengineer the TS based on the state, then use machine learning:

Take a bunch of indicator values on the input, the output of the state. Fitting by mathematical models.

I did not deal with such nonsense.

Errors, bugs, questions Points VS Pips FOREX - Trends, forecasts

СанСаныч Фоменко 2016.02.17 07:59 #78

Alexey Volchanskiy:
As a side note, I have a question for everyone in the discussion. Do you work with tick data? I moved away from bar analysis a long time ago and work exclusively with DSP methods

The use of DSP is highly questionable.

For tick data, cointegration ideas are more suitable.

СанСаныч Фоменко 2016.02.17 08:21 #79

Alexey Burnakov:
Here, it doesn't matter what regression or classification. All the same. This is just an article specifically about regression.

Just to clarify: You have examples for training with what step are taken, one bar (i.e. inputs of each row of data array) or n bars, so that there is a time interval between rows?

I'm not just being nerdy, and I certainly don't want to discredit your work (your articles help me).

Let me explain my point with a practical example, without snatching quotes from statistical studies:

in a decision tree you will have, say, m terminal nodes. Each node will contain cases similar in input vectors - a subspace of input values. So, if you have consecutive bar-shifted examples using inputs that look back a few bars (in the worst case, also hundreds of bars), there will be a bright autocorrelation between the nearby points, but since we are predicting the future several bars ahead (in the worst case, also hundreds of bars), the nearby outputs will be the same. For example, the column of outputs will be formed by the sequences 0 0 0 0 0 0 0 00 0 1 1 1 1 1 1 1. So, dozens of identical outputs will fall into our terminal nodes, referring to adjacent - similar - inputs. You could say that there will be a redundancy of identical examples bunched up by time points, which will skew the distribution of responses in the most emphatic way. That is why there is a popular recommendation not to hold more than one position in the market, because the dependence effect of neighboring entries and exits is also present when training an EA in the terminal.

In this case there will be tough retraining, or rather forming statistics of non-dependent observations. That is, the most unpleasant thing that can be obtained when analyzing time series is the dependence of neighboring data vectors. If the data vectors are far away in time, it's OK. In this case, machine learning boils down to finding patterns that are invariant with respect to time.

And then, referring to the error matrix you give as an example in the article:

I can only say that it is fantastic. ) The experiment has been conducted in error. You can never achieve such a steep error matrix on a sample with independent examples, and at the same time strictly separating the test dataset from the training dataset by time (look-ahead bias).

And the fact that the error matrix on the test set is also fantastic suggests that a sample was taken mixed with the training sample by time, in which similar examples are also "huddled". That is, this particular result says nothing about the ability of the constructed model to predict the market.

You may try to take a bit more data and test it using the tail(all_data, 1/3) logic and see how the number of observations in the cells of the matrix is aligned. You can even apply the chi-square criterion to see if the guessing has become almost random.

All that I wanted to convey to you, I have tried to do. Note, with good intentions)

Good luck! Alexey

Sorry for meddling but it seems to be a public discussion.

Your post seems to me to be a mix of several related but different problems.

1. What do you teach the model? Trends? Level breakdown? A deviation from something? It seems to be very simple to select the teacher of the model but in practice it causes certain difficulties. Anyway, we should prepare the teacher (the vector according to which the model is trained) very specifically for our trading idea, for example, "I trade trends".

2. On what do you teach? In your post you mention the presence of dependence between adjacent bars. Yes, there are wooden models (CORELearn) that account for dependencies between adjacent bars, but the problemf you raise is much broader and nastier and has little to do with the model used. It's model retraining. The way I see it, there are datasets that ALWAYS produce over-trained models. And no amount of techniques to eliminate overtraining helps here.

There are input datasets (sets of predictors) among which there are predictors that can be used to build models that are NOT over-trained. But the remaining predictors generate so much noise that these noise predictors cannot be screened out by the existing predictor selection packages.

Therefore, a manual selection of predictors based on the criterion "seems to be relevant to our teacher, the target variable" is mandatory.

PS.

It's funny to say, but when trading trends any predictors obtained by smoothing, in particular MA, are extremely noisy and models are always over-trained. And when trained on OOV samples, you can get an error of 5% as well!

Machine learning in trading: Bayesian regression - Has Is there a pattern

Vladimir Perervenko 2016.02.17 09:52 #80

Alexey Burnakov:
Here, it doesn't matter what regression or classification. All the same. It's just an article specifically about regression.

Just to clarify: You have examples for training, with what step are taken, one bar (ie, the inputs of each row of the array data) or n bars, so that there was a time lag between the rows?

The initial dataset is a matrix or dataframe containing inputs and target. When divided (stratified) into training and test sets, the examples are shuffled randomly, but the distribution of classes in the sets is kept the same as in the original set. Therefore it is not possible to say at what pitch the examples are taken. Obviously you are confusing vector to matrix transformation, where you can talk about time lag.

I'm not just being nerdy and I certainly don't want to discredit your work (your articles help me).

Yes I am far from thinking that way. But I really can't understand the question.

Let me explain my thought with a practical example, without pulling quotes from statistical studies:

in a decision tree you will have, say, m terminal nodes. Each node will contain cases similar in input vectors - a subspace of input values. So, if you have consecutive bar-shifted examples using inputs that look back a few bars (in the worst case, also hundreds of bars), there will be a bright autocorrelation between the adjacent points, but at the same time, since we predict the future a few bars ahead (in the worst case, also hundreds of bars), the adjacent outputs will be the same too. For example, the column of outputs will be formed by the sequences 0 0 0 0 0 0 0 00 0 1 1 1 1 1 1 1. So, dozens of identical outputs will fall into our terminal nodes, referring to adjacent - similar - inputs. You could say that there will be a redundancy of identical examples bunched up by time points, which will skew the distribution of responses in the most emphatic way. That is why there is a popular recommendation not to keep more than one position in the market, because the dependence effect of neighboring entries and exits is also present when training an EA in the terminal.

In this case there will be tough retraining, or rather forming statistics of non-dependent observations. That is, the most unpleasant thing that can be obtained when analyzing time series is the dependence of neighboring data vectors. If the data vectors are far away in time, it's OK. In this case, machine learning boils down to finding patterns that are invariant with respect to time.

And then, referring to the error matrix you give as an example in the article:

I can only say that it is fantastic. ) The experiment has been conducted in error. You can never achieve such a steep error matrix on a sample with independent examples, and at the same time strictly separating the test dataset from the training dataset by time (look-ahead bias).

And the fact that the error matrix on the test set is also fantastic suggests that a sample was taken mixed with the training sample by time, in which similar examples are also "huddled". That is, this particular result says nothing about the ability of the constructed model to predict the market.

You may try to take a bit more data and test it using the tail(all_data, 1/3) logic and see how the number of observations in the cells of the matrix is aligned. You could even apply the chi-square criterion to see if the guessing has become almost random.

So you will lay out an example to explain it on your fingers. . Or do you think I haven't done such tests?

All that I wanted to convey to you, I have tried to do. Notice, with good intentions )

I really want to understand what you are trying to convey. Using an example, I think it would be clearer.

When they say that you made an experimentwith a mistake, you have to tell them what it is and give them the right solution. You have the package, examples, describe how you think the calculation should be carried out.

No offence.

Good luck

Research in matrix packages - page 8