Bayesian regression - Has anyone made an EA using this algorithm? - page 51

 
Дмитрий:
And how do you determine the "best" combination?
Crossvalidation with 5 iterations. But not the usual one that randomly selects examples, but one adapted to time series - with isolation of examples by time. The blog describes it all.

The average value of the target metric over the 5 test fouls indicates the best combination of training parameters.

The model is then trained on the Whole training sample and validated on another sample.
 
  • I'm wondering what else I should change in the experiment. Tried changing the case indexing logic for crossvalidation, no improvement.
  • Thinking of bringing the inputs to a discrete form.
  • Another option is to go down from a complex model to a simpler one. A complex model is an accelerated decision forest. The simpler one is accelerated linear regression models, where regularisation parameters can also be adjusted.
  • But the point is that improving 55% accuracy to 60% just by changing something in terms of design is difficult.
  • Building committee models too will at best give a fraction of a percentage improvement. It takes longer to develop and train the architecture.
  • There is an idea to look at the important predictors and if they are often at the edge of the window (724 minutes), then you could extend the window to 1440 minutes per day. But then the number of samples will be halved, because I will take them in 1440 +- random term increments.
  • Also another target, for example price level breakdown (Take Profit / Stop Loss) or general "direction", for example linear regression coefficient, can be predicted.
Everything takes time. But I will definitely devote another couple of months to the experiment. Maybe something will come out.
 
The basic problem is that of the list of predictors. Having justified the list of predictors, we can move on to the rest.
 
СанСаныч Фоменко:
The basic problem is that of the list of predictors. Having justified the list of predictors we can move on to the rest.
Thank you. I, too, am leaning towards adding more predictors.
 
Alexey Burnakov:
Thank you. I'm leaning towards adding more predictors too.
Do you think the number of predictors you use is not enough?
 
Алексей Тарабанов:
Do you think the number of predictors you use is insufficient?
I don't know for sure.

Or the available predictors don't carry enough information,
Either the relationships change a lot over time and the generalisability decreases
Either the predictors themselves change the distribution parameters over time.

For the first point, more can be added, and selecting the best ones is always possible.
On the other points processing of data may help, but not one hundred percent.

I am inclined to a combination of all these factors. The point is that still on crossvalidation the accuracy is 55-60% and it falls as the forecast horizon increases. So the predictors don't give much information on the training sample either.

That's if I were to see that there's great accuracy on the test and it drops off sharply on validation, with the experiment constructed correctly, so the dependencies are non-stationary.
 
Alexey Burnakov:
I don't know for sure.

Either the available predictors don't carry enough information,
Either the relationships change a lot over time and the generalisability decreases
Either the predictors themselves change the distribution parameters over time.

For the first point, more can be added, and selecting the best ones is always possible.
On the other points processing of data may help, but not one hundred percent.

I am inclined to a combination of all these factors. The point is that still on crossvalidation the accuracy is 55-60% and it falls as the forecast horizon increases. So the predictors don't give much information on the training sample either.

That's if I were to see high accuracy on the test and it falls off sharply on validation, with the experiment constructed correctly, then the dependencies are non-stationary.
Most likely they do.
 
Alexey Burnakov:
I do not know for sure.

Either the available predictors do not carry enough information,
Either the relationships change a lot over time and the generalisability decreases
Either the predictors themselves change the distribution parameters over time.

For the first point, more can be added, and selecting the best ones is always possible.
On the other points processing of data may help, but not one hundred percent.

I am inclined to a combination of all these factors. The point is that still on crossvalidation the accuracy is 55-60% and it falls as the forecast horizon increases. So the predictors don't give much information on the training sample either.

That's if I were to see that there's high accuracy on the test and it drops dramatically on validation, with the experiment constructed correctly, so the dependencies are non-stationary.

I've already written, I'll say it again.

I performed the work on selection of predictors several times, including on demand. The results are given below

So.

Let's take some set of predictors, not less than 50, and better than a hundred.

All sets of predictors I dealt with (i.e. I do not claim to generalize) can be divided into two parts:

  • the part of predictors that are relevant to the target variable
  • the part of predictors that have nothing to do with the target variable - noise

I write "relation" very carefully and quite deliberately do not use any terms.

Example of predictors:

  • waving - does NOT relate to the target variable ZZ
  • price deviation from raspka is relevant to the target variable ZZ

Please note that I am specifying the target variable. For the other target variable, it may be the other way around

The problem with having these two groups of predictors in the original set of predictors is that the standard tools for determining IMPORTANCE do not work. Therefore, some tools are needed, and I have developed and use them, which allow for the coarse sifting of noise predictors. It should be noted that there is no unambiguity here. The algorithm quantifies separately for valid and nominal predictors. Less than 2 (some relative value) is noise for sure. Between 2 and 3: can be used, but better not....

The problem with noise is that predictors relevant to noise clobber predictors not relevant to them. For example, randomforest, ada, and svm algorithms for some reason build the model more on these noise predictors.

Having screened out the noisy predictors, and in my sets there were about 80%(!), we take the rest of the list of predictors and to it start applying the tools from R to determine the importance of the variables. The actual number of predictors used to train the model is about half of the NOT noise predictors, i.e. about 10% of the original set.

I determine the importance of predictors in the window. As the window moves, the list of predictors from the basic 20% changes all the time. I.e. 12-15 predictors are used to build the model, but they are different as the window moves following the quote.

What's the reason?

Well, the point is that clearing the set of predictors from noise ones leads to creation of models which are NOT retrained.

In numbers.

On full set of predictors it is possible to build models with 3%-5% prediction error! And any algorithms that divide the sample into parts, the so-called "out-of-sample" - OOV, confirm this result. This is very well seen in raatle, which always divides the original sample into parts and is very happy with the results.

But.

If the initial sample contains noise predictors, then if we take a real "out-of-sample", i.e. for example the sample for training from 01.06.2015 to 01.01.2016, and then calculate on the sample after January 1, we can easily obtain an error of 50% and 70% instead of 3%-5%! Moreover, the further away from 1 January, the worse the result.

MODEL IS RETRAINED

If I cleanse the original set of noise predictors, the results are as follows and the same for randomforest, ada SVM as well as several other models - i.e. the model solved nothing in my cases, the results are: prediction error is about 30% on any set. By applying R's predictor importance tools we can further reduce the error to around 25%. It was not possible to improve this result for the target variable ZZ.

 
СанСаныч Фоменко:

I've already written, I'll say it again.

I have done the work of selecting predictors several times, including on commission. The results are given below

So.

Let's take some set of predictors, not less than 50, and preferably more than a hundred.

All sets of predictors I dealt with (i.e. I do not claim to generalize) can be divided into two parts:

  • the part of predictors that are relevant to the target variable
  • the part of predictors that have nothing to do with the target variable - noise

I write "relation" very carefully and quite deliberately do not use any terms.

Example of predictors:

  • waving - does NOT relate to the target variable ZZ
  • price deviation from raspka is relevant to the target variable ZZ

Please note that I am specifying the target variable. For the other target variable, it may be the other way around

The problem with having these two groups of predictors in the original set of predictors is that the standard tools for determining IMPORTANCE do not work. Therefore, some tools are needed, and I have developed and use them, which allow for the coarse sifting of noise predictors. It should be noted that there is no unambiguity here. The algorithm quantifies separately for valid and nominal predictors. Less than 2 (some relative value) is noise for sure. Between 2 and 3: can be used, but better not....

The problem with noise is that predictors relevant to noise clobber predictors not relevant to them. For example, randomforest, ada, and svm algorithms for some reason build the model more on these noise predictors.

Having screened out the noisy predictors, and in my sets there were about 80%(!), we take the rest of the list of predictors and to it start applying the tools from R to determine the importance of the variables. The actual number of predictors used to train the model is about half of the NOT noise predictors, i.e. about 10% of the original set.

I determine the importance of predictors in the window. As the window moves, the list of predictors from the basic 20% changes all the time. I.e. 12-15 predictors are used to build the model, but they are different as the window moves following the quote.

What's the reason?

Well, the point is that clearing the set of predictors from noise ones leads to creation of models which are NOT retrained.

In numbers.

On full set of predictors it is possible to build models with 3%-5% prediction error! And any algorithms that divide the sample into parts, the so-called "out-of-sample" - OOV, confirm this result. This is very well seen in raatle, which always divides the original sample into parts and is very happy with the results.

But.

If the initial sample contains noise predictors, then if we take a real "out-of-sample", i.e. for example the sample for training from 01.06.2015 to 01.01.2016, and then calculate on the sample after January 1, we can easily obtain an error of 50% and 70% instead of 3%-5%! Moreover, the further away from 1 January, the worse the result.

MODEL IS RETRAINED

If I cleanse the original set of noise predictors, the results are as follows and the same for randomforest, ada SVM as well as several other models - i.e. the model solved nothing in my cases, the results are as follows: the prediction error is about 30% on any set. By applying R's predictor importance tools we can further reduce the error to around 25%. It was not possible to improve this result for the target variable ZZ.

Thank you.

I see what you are thinking. From all of the above I saw a possibility to calculate the importance of predictors on several parts of the training sample, then compare the lists and select duplicates.

I can't say anything about manual selection, I prefer to use the machine right away.

SZZ: I'll try to apply my homebrew method, based on the mutual information function, in addition to the importance of the variables from the decision forest. I'll show you the results later.

 
СанСаныч Фоменко:


The problem with noise is that predictors relating to noise clog up predictors not relating to it. For example, randomforest, ada, and svm algorithms for some reason build the model more on these noise predictors.


Question: does SVM account for interactions between variables or is it just the sum of weighted individual components?
Reason: