Bayesian regression - Has anyone made an EA using this algorithm? - page 26

 
СанСаныч Фоменко:

Yes, I did... I don't remember...

If we talk about the tester, this is the problem, in my opinion.

We take some sample and use the tester to calculate, for example, the profit factor. Then we take another sample and get a new profit factor value. Altogether we get two figures. Are two figures the basis for statistical conclusions? These figures don't mean anything at all.

It must be solved and is solved differently.

A sample is taken. Some subset is randomly selected from this sample and counted as a profit factor on it. Then again a random sample is taken and so on, for example 1000 times. You will get 1000 profit factors. This set can already serve as the basis for statistical conclusions.

By the way this method does not exclude the use of a tester, demo...

SS, good day!

A sample of what? Deals?

I see what you're getting at - estimating quantiles of sample statistics by bootstrap type.

But the question remains, if the TS is fitted on the whole available sample, then this method will show what we already know, which is that the profit factor is oh-so-good.

I'm going to approach the question a little differently in my blog. First, training with k-fold crossvalidation; take 5 subsets (in my case it will be data of 5 currency pairs) and you will get an interesting experiment: training the model on 4 pairs and testing on the 5th. I want to repeat this experiment 5 times. But that's not all. Originally I was purely interested in showing that forex is predictive. So, the m-best models (on the crossvalidation cutoff) will pass validation on a large sample that the machine hasn't yet seen. Another property of validation sampling is that it aggregates prices separated by time in the future. I expect to get a matrix of 100 * 18 with validation results for the top 100 models on 18 predicted output variables (from 2 to 724 minutes into the future), e.g. by the R^2 criterion. This will immediately show whether my "boxes" predict better than the average.

And if there was a sponsor for my self-discovery, one could rent an Amazon cloud and repeat the experiment 1,000 times, each time generating a different training and validation set. And then there would be a 1000 * 100 * 18 three-dimensional matrix that would allow you to estimate the standard error of the R^2 metric for the best models and for different targets. But that's already thick.

My point is that we should test on future data (or on past data, but clearly separated by time) and all will be well.

 
Alexey Burnakov:

CC, good afternoon!

A selection of what? Transactions?

I see what you're getting at - estimating quantiles of sample statistics by bootstrap type.

But the question remains, if the TS is fitted on the whole available sample, then this method will show what we already know, which is that the profit factor is oh-so-good.

I'm going to approach the question a little differently in my blog. First, training with k-fold crossvalidation; take 5 subsets (in my case it will be data of 5 currency pairs) and you will get an interesting experiment: training the model on 4 pairs and testing on the 5th. I want to repeat this experiment 5 times. But that's not all. Originally I was purely interested in showing that forex is predictive. So, the m-best models (on the crossvalidation cutoff) will pass validation on a large sample that the machine hasn't yet seen. Another property of validation sampling is that it aggregates prices separated by time in the future. I expect to get a matrix of 100 * 18 with validation results for the top 100 models on 18 predicted output variables (from 2 to 724 minutes into the future), e.g. by the R^2 criterion. This will immediately show whether my "boxes" predict better than the average.

And if there was a sponsor for my self-discovery, one could rent an Amazon cloud and repeat the experiment 1,000 times, each time generating a different training and validation set. And then there would be a 1000 * 100 * 18 three-dimensional matrix that would allow you to estimate the standard error of the R^2 metric for the best models and for different targets. But that's already thick.

My point is that you need to test on future data (or past data, but clearly separated by time) and you'll be fine.

The key word of your post is highlighted in red.

Overfitting (overfitting) a model is a methodological problem of all science: an overfitted model (any in any branch of knowledge) accounts for some particulars on the training data, and then those particulars are not found outside the training data. In doing so, the model fails to capture certain patterns that are common to the general population.

My understanding is that this problem is not solved by the models themselves, or by any statistical testing techniques: you correctly note that over-trained results will be justified. Attempting to use model "coarsening" techniques in my hands has not led to results.

To me the problem of overtraining is entirely generated by the input data set. On an intuitive level: do the raw data (predictors) relate to the variable or not. The extreme case - don't have any at all - I don't consider. Intermediate: some of them do and some of them don't. A perfectly workable situation from personal experience is when predictors that are not relevant to the target variable "silence" predictors that are relevant to the target variable. If you have managed to manually screen out the most odious noisy predictors, then at some point the predictor screening algorithms start to work.

The numbers are as follows. I have selected about 30 predictors from the initial set of 50-200 by my own methods. These 30 items do not generate any overtrained models, i.e. effectiveness on the training, AOVA, testing and validation samples is approximately the same. Then using packages (I use varSelRF, others are also possible) I select predictors using a sliding window, about 300-500 bars. I get working set of 10-15 predictors. As the window is moving the composition of predictors changes. Any of the resulting sets of predictors also does not lead to overtraining, but increases the performance from 5% to 10%. For classification tasks this is about 20% error.

 
СанСаныч Фоменко:


To me, the over-learning problem is entirely generated by the input data set. On an intuitive level: whether or not the raw data (predictors) are relevant to the variable. The extreme case - don't have any at all - I don't consider. Intermediate: some of them do and some of them don't. A perfectly workable situation from personal experience is when predictors that are not relevant to the target variable "silence" predictors that are relevant to the target variable. If you have managed to manually screen out the most odious noisy predictors, then at some point the predictor screening algorithms start to work.

The numbers are as follows. I have selected about 30 predictors from the initial set of 50-200 by my own methods. These 30 items do not generate any overtrained models, i.e. effectiveness on the training, AOVA, testing and validation samples is approximately the same. Then using packages (I use varSelRF, others are also possible) I select predictors using a sliding window, about 300-500 bars. I get working set of 10-15 predictors. As the window is moving the composition of predictors changes. Any of the resulting sets of predictors also does not lead to overtraining, but increases the performance from 5% to 10%. For classification tasks this is about 20% error.

You can deal with noise by selecting variables, as you said. But am I right in saying that you are saying that noisy data prevents the selection of good variables and they should be found and removed in advance by some euristics?

And another question. You're using a kind of decision forest, as I understand from this acronym (I haven't used such a method myself).

Any decision forest tree is a greedy variable selection algorithm + metric that allows you to draw a split (cutoff boundary) on the area of values of that variable. So, based on the "greedy" trees, they should select the most individually significant variables at the top levels of the branches. At the lower levels of the tree (starting as early as the second level, in fact), the selection of the next most important variables is already made on a subsample of observations that have been generated by the intervening cut-offs. And in this subsample we also look for the next most important variable (for each variable we go through all possible cutoffs and select the best one, for which the metric of relevance of the variable is calculated). But since this is already working with a subsample of observations, it turns out that in general the conditional distribution comes into effect: var #18 is best in subsample #4 given variable #2 used before. Thus, the significance of the variables becomes more and more dependent on the variables used above as the depth of the tree increases.

According to the authors' idea, the noise variables should be left out. BUT! Greedy nature of the approach leads to the fact that, for example, variables rejected at level 1, in conjunction with other possible variables at another possible level 2, may also give a significant gain in the target metric, which itself is overboard. It may also be the case, in the extreme case, that two independent variables, each of which is noise in terms of the target variable, give such a joint distribution with the target variable that they become a significant interaction. A greedy model will look this up, but it rarely happens, I mean this configuration of data.

What do you think?

 
Please answer one question, have any of you made any money on forex? I've already spent 5 years of my life and over a million roubles and nothing has come of it to date.
 
Alexey Burnakov:

You can deal with the noise by selecting variables, as you said. But am I right in saying that you are saying that noisy data prevents us from selecting good variables and we need to find and remove them beforehand using some euristics?

Exactly.

And another question. You are using a kind of decision forest, as I understood from this acronym (I haven't used such a method myself).

Forest as an example, but the chosen model of the problem, when noise variables "drown out" normal variables, does not solve the problem: variable selection is an independent problem.

 
Mikhail Gorenberg:
Please answer one question, have any of you made any money on forex? I've already spent 5 years of my life and over a million roubles and nothing has come of it to date.
Start with signals, learn how to select profitable ones...
 
Mikhail Gorenberg:
Please answer one question, have any of you made any money on forex? I've already spent 5 years of my life and over a million roubles and nothing has come of it to date.
No wonder, you are not a botanist in terms of quantiles, crossvaluations and bootstraps.
 
Mikhail Gorenberg:
Please answer one question, who among you is making money on forex? I've already spent 5 years of my life and more than a million rubles and nothing has come of it so far.

I ended up with roughly zero (maybe a slight surplus). All my ventures have been ruined by greed.

Even if you have a profitable trading robot (I have one) - I mean really profitable and consistent - when it works and you're waiting for results for months, it starts to look like 30% growth is not enough. And then comes farther. That is, even with fully automated trading, where risks are calculated and max drawdown is controlled, psychologically I personally do not feel comfortable.

I have decided to use quantiles, therefore.

PS: if i'll make a good automated trading system, i'll make it available for everybody. And I will rent it out to someone with money.

 
СанСаныч Фоменко:

You can deal with the noise by selecting variables, as you said. But am I right in saying that you are saying that noisy data prevents us from selecting good variables and we need to find and remove them beforehand using some euristics?

Exactly.

And another question. You're using a kind of decision forest, as I understand from this acronym (I haven't used such a method myself).

Forest as an example, but the chosen model of the above problem, when noise variables "drown out" normal variables, does not solve the problem: variable selection is an independent problem.

Why does noise drown out the signal? It is not entirely clear to me.

You see, if the model itself incorporates feature regularisation, then all noise data is ignored after comparing its effect on the target variable. If the model doesn't provide protection from overtraining (for example, a lower threshold of number of observations in the terminal node of the tree was not set), then all the noise will be used. But this is solved even within random forest.

Maybe you can explain your idea in more details?

 
Alexey Burnakov:

Why does the noise jam the signal? It's not entirely clear to me.

You see, if the model itself includes feature regularization, then all noise data is ignored after comparing its effect on the target variable. If the model does not provide protection from overtraining (for example, a lower threshold of number of observations in the terminal node of the tree was not set), then all the noise will be used. But this is solved even within random forest.

Maybe you could explain your idea in more details?

Why noise is jamming the signal? It is not quite clear to me.

I don't.

I have a program that calculates a measure of the ratio of the predictor to the target variable. It's some abstract value.

If it's < 1, it's noise.

1-2 - better not to mess with it.

over 3 - good.

over 5 - good luck, but rare.

So, if there are too many predictors with a score less than 2, I can't distinguish useful predictors by any other means but my own. How to interpret this - I don't know. Let's not forget that predictors have an impact not only on the target variable, but also among themselves. Very often it is not only the list of predictors to be deleted that is important, but also the order in which they are deleted.

What's my point?

The basic problem of trading is the problem of overtraining (overfitting), this is a problem in its own right and does not depend on the models and methods used.

Reason: