Optimisation and Out-of-Sample Testing. - page 4

 

Yes, indeed!

In addition, (taking into account the criticism as well) I would like to point out that there is a significant objection to those who do not quite agree with the simple execution of the idea.

"One should not complicate the essence of things beyond what is necessary!" (Br. Occam)

The first optimization followed by the second (out-of-sample) is a simple and sufficient solution!

We are, after all, aiming to make a profit, not to engage in optimisation, for the sake of optimisation.

 
Vita:

--> Slashing does the same thing (marking, checking). What extra does splitting bring to the table compared to a straight-through run?


Splitting effectively filters the information. In terms of quantity.

Vita wrote (a):

--> If by "processing optimization results over the whole sample" you mean discarding set D, then I disagree - discarding results which give intermediate losses (in sample or out of sample) is elementary task which is solved during optimization itself over the whole sample, i.e. no processing after optimization is needed. The result is set B right away. And no time consuming additional checks outside sample.

Do you mean optimization in the mathematical sense or in the real tester? The sample, as I understand it, now includes both samples. Do you think you can save time by running three unnecessary sets over both samples? As for implementation, if you write your own tester from scratch, perhaps the implementation costs will be negligible. Try to implement, then we can come back to the question.

 
leonid553:

Yes, indeed!

In addition, (taking into account the criticism as well) I would point out that there is a significant objection to those who do not quite agree with the simple fulfillment of the idea.

"One should not complicate the essence of things beyond what is necessary!" (Br. Occam)

The first optimization followed by the second (out-of-sample) is a simple and sufficient solution!

We are, after all, pursuing the goal of profit, not optimization, for the sake of optimization.


So am I, and I am in complete agreement with Occam. You shouldn't do two optimisations - one is enough.

You're saying, "After the optimization of an Expert Advisor, we will often have to nag and tediously run through out of sample more than a dozen sets of parameters suggested by the optimizer.

A run on the whole population, without dividing into sample-outsample, is no less sufficient and even more straightforward solution.

 

Categorically disagree, Vita. Otherwise in neural networks there would be no division of all data into three parts, fundamentally different: real optimization - only on the first part; the second serves only to determine the moment of training termination, and the third - just for single testing. That is, the real fitting only goes on the first, and on the third, it's whatever it turns out to be... And the choice - "Occam's razor" or loss of confidence in the system - is left to the creator of the system.

Roughly speaking, optimising on A+B+C is not at all the same as the processing described above.

 
Mathemat, I think what he means is to collect the whole bunch (don't forget combinatorics) of results, divide it into 4 sets and discard three of them.
 
Mathemat:

Categorically disagree, Vita. Otherwise there would be no division of all data into three parts in neural nets. Besides the real optimization is only on the first part; the second serves only to determine the moment of training termination, and the third is just for single testing.


Dividing data in neural networks into three parts, I suspect, exists for learning laws (like 2x2=4), to reveal those patterns. And only if there are any. Otherwise the neural network will be fitted to the curve by itself.

And then, as it seemed to me, the task of the tester is not to train or detect patterns, but to find the optimal set of parameters. This can be done with simple brute force or genetic algorithms or maybe a neural network. But once a set of optimal parameters has been found for a sample, how do you avoid curve fitting? The principle? What are the bad elements of the set that disappear when you test for out-of-sample?

 
lna01:
Vita:

--> Slashing does the same thing (marking, checking). What extra does splitting bring to the table compared to a straight-through run?


Splitting effectively filters the information. In terms of quantity.

--> splitting filters out results with losses in the sample or out-of-sample, but with total cumulative gains. That's not something I'd like to discard.

Vita wrote (a):

--> If by "processing optimization results over the whole sample" you mean discarding set D, then I disagree - discarding results that give intermediate losses (in sample or out of sample) is an elementary task solved during optimization itself over the whole sample, i.e. no processing after optimization is needed. The result is set B immediately. And no time consuming additional checks outside the sample.

Do you mean optimization in the mathematical sense or in the real tester? The sample, as I understand it, now includes both samples. Do you think you can save time by running three unnecessary sets over both samples? As for implementation, if you write your own tester from scratch, perhaps the implementation costs will be negligible. Try the implementation, then we can come back to the question.

--> " Sampling, as I understand it, now includes both samples." - sorry, I didn't mean to make you think that. Forget about it.

What I meant to say was that the real MetaTrader's tester allows you to get the same results when optimizing a sample+unsample population as optimizing a sample followed by testing outside the sample. In the Tester, the "Expert Properties" button, then the "Testing" and "Optimization" tabs allow you to get rid of losses of any length and depth you want. And since I stand by the fact that optimising a sample followed by an out-of-sample test does not get rid of anything else and does not add anything at all, this is the solution to the problem.

Unfortunately, it is in the mathematical sense that one can achieve perfect optimization of parameters under any given curve. The trick with as if testing "for the future" outside the sample is a hidden but still the same trivial optimization on the wholegiven population of sample+outside the sample. No guarantees for the future, no getting rid of curve fitting. A workable set of parameters has to be found in another way.

 
The division of data in neural networks into three parts, I suspect, exists to teach laws (like 2x2=4), to reveal those patterns.

That's right, Vita, for identifying patterns. What are we doing here? Two more data sets are invented to prevent this identification from degenerating into trivial "memorizing" of initial population (= curvafitting). In NS it works like this: learning on set A (training set) towards the reduction of the target function (it is usually an error of prediction or classification). Learning is organized in such a way that the error on A decreases monotonically.

At the same time the same error with the same parameters is tested on set B (validating set). There the error changes as a cascade function (first it falls, then it minimizes, then it rises). As soon as error minimum is reached on set B, training stops. Further continuation of training on A, even if the error decreases, leads to curvafitting, because the error on set B already starts growing. At this point, the network's ability to generalize is said to decrease. That is why training is forcibly stopped, without bringing the fitting on A to the limit (and this is the fundamental difference between the algorithm and the fitting done by the metaquote optimizer).

Finally, the set of parameters at which learning has stopped is run over the third set C (the test set). This is the true check of training quality, because the data on C has no effect on the training before. Of course there is no guarantee of stable NS operation with the found parameters, but this algorithm cancels at least 95% of pseudo-graals which are every first one here on the forum :).

And a simple search on a single data fragment is the purest curvafitting, ideal at the training area and worthless outside it.

Of course MT4 is not a neural network program and the general algorithm will have to be redesigned, but it's still better than trivial curvafitting, that we call "optimization", hehe...

 
Vita:

I wanted to say that the real MetaTrader's tester allows you to get the same results when optimizing a sample + out-of-sample population as optimizing a sample followed by an out-of-sample test. The "Expert Advisor Properties" button and "Testing" and "Optimization" tabs allow getting rid of losses of any length and depth.

It all depends on the task definition. If we neglect the degree of evenness of profit distribution over testing time, the MT tester's standard capabilities are really sufficient and the time spent will be comparable. Is it worth neglecting this? Everyone has their own experience and views. The process can indeed be called fitting, but I think the term approximation would be more accurate. Not every approximation can be extrapolated into the future and the criterion of profit uniformity just allows to reject those variants which are obviously not suitable for extrapolation. IMHO, of course.

 
Mathemat:

And a simple overshoot on a single piece of data is pure curvafitting, perfect in the training area and worthless outside it.

--> Exactly, it is.

But what does breaking up the plot into fragments do? What does "pseudo-learning" on three different fragments lead to? Does it lead to profit on each individual plot? How is such a fit better? Does it give a guarantee of off-plot profitability? If you believe that, then by all means. What's more, the tester gives you the ability to smother the curve so that on each fragment A, B, C... you make a profit.

Just let's do away with neural networks, because they are not even close to this topic. People are manually doing the hassle for the sake of a dubious advantage, which, by the way, I pointed out, but have not heard anything more from, in what way the results after optimization on a sample and tests on the out-of-sample are better than a banal run? You see, we're talking about real work and results, not neural network theory. Would you be so kind as to point out any real advantages, if there still are any, other than the one I pointed out.

Reason: