Machine learning in trading: theory, models, practice and algo-trading - page 2039

 
elibrarius:

You can mix inside the train or test, but there is no point, and you can't mix between train and test. You did not mix them there by any chance, something very good result for a test with an exam.

I did not mix and the result is just not very good for the test sample - Recall is small.

But shuffling is fine, if we believe that the pattern is stable and not decreasing towards the end of the sample. Here just the shaping of the file could be with the help of a loop and some of the parameters just didn't get into the study area.

 
Aleksey Vyazmikin:

I didn't mix and the result just isn't very good for the test sample - Recall is small.

But it is quite possible to shuffle, if we believe that the pattern is stable, and not decreasing by the end of the sample. Just here, the file might have been shaped with the help of a loop, and some of the parameters just didn't get into the study area.

If you mix it up, the test will immediately improve - there will be a peek, on the neighboring bar. I.e. one bar (10:00) is in the trace, the adjacent one (10:01) is in the test, while they are very similar to each other both for the last one and the target one.
 
elibrarius:
If you mix it up - the test will immediately improve - there will be a peek, on the neighboring bar. I.e. one bar (10:00) is in the trace, the neighboring one (10:01) is in the test, and they are very similar to each other, both in the past and in the target.

Do the bars in the sample take turns? I simply do not train on every bar, but only take signals and in that case I believe it is possible to mix samples - it increases the amount of information to be trained without increasing the sample size, which, according to my data, improves the test sample.

 
Aleksey Vyazmikin:

Are the bars in the sample there in turn?

I already deleted the file, check your own. I think that one by one is the most logical way to create a CSV file.

 
elibrarius:

I already deleted the file, check your own. I think that one by one is the most logical way to generate a CSV file.

There's something in there - I don't know what it is.

 2 6 0 4 2 6 57 57 100 100 -1
4 2 6 0 4 2 6 57 57 100 200 -1
4 2 6 0 4 2 6 57 57 100 300 -1
4 2 6 0 4 2 6 57 57 100 400 -1
4 2 6 0 4 2 6 57 57 100 500 -1
4 2 6 0 4 2 6 57 57 100 600 -1
4 2 6 0 4 2 6 57 57 100 700 -1
4 2 6 0 4 2 6 57 57 100 800 -1
4 2 6 0 4 2 6 57 57 100 900 -1
4 2 6 0 4 2 6 57 57 100 1000 -1
4 2 6 0 4 2 6 57 57 100 1100 -1
4 2 6 0 4 2 6 57 57 100 1200 -1
4 2 6 0 4 2 6 57 57 100 1300 -1
4 2 6 0 4 2 6 57 57 100 1400 -1
4 2 6 0 4 2 6 57 57 100 1500 -1
4 2 6 0 4 2 6 57 57 100 1600 -1
4 2 6 0 4 2 6 57 57 100 1700 -1
4 2 6 0 4 2 6 57 57 100 1800 -1
4 2 6 0 4 2 6 57 57 100 1900 -1
4 2 6 0 4 2 6 57 57 100 2000 -1
4 2 6 0 4 2 6 57 57 100 2100 -1
4 2 6 0 4 2 6 57 57 100 2200 -1
4 2 6 0 4 2 6 57 57 100 2300 -1
4 2 6 0 4 2 6 57 57 100 2400 -1
4 2 6 0 4 2 6 57 57 100 2500 -1
 
Aleksey Vyazmikin:

There's something there - I don't know what it is.

Waiting for an answer from the owner of the file.
 
elibrarius:
Waiting to hear back from the owner of the file.

By the way, I took 1% of sample and trained C4.5 tree - it gave me 100% recognition on training and test sample, so I guess it's about ordered loops, which need to be shuffled. But I don't have a good mixing algorithm in MQL5 - I can just pull n lines from both test and control samples and thus form a test sample - and there I have no luck again with class balance, and loops would not be good here.

 
Aleksey Vyazmikin:

It's just that I don't train on every bar, but only take signals, and in this case I think it is possible to mix samples - it increases the amount of information for training without increasing the sample size, which according to my data improves the exam sampling.

Maybe you can, but I don't think you should throw some of the future into the track, you can't do that in real trading. Let the future remain unknown in training as well as in reality.
The test and the exam and have to be better and worse all the time. It's all right. The main thing is to be on the plus side.
Use cross-validation (you may have it built in) or better yet valedictorian forward.

 
Aleksey Vyazmikin:

By the way, I took 1% of sample and trained C4.5 tree - it gave me 100% recognition on the training and test sample, so I assume that it's about ordered loops, which need to be mixed. I just don't have a good mixing algorithm in MQL5 - I can just pull n lines from both test and control samples and thus form a test sample - and then I'll be out of luck again depending on class balance and cycles would not be good.

Here with such a RNG you can normally mix https://www.mql5.com/ru/blogs/post/735953
Rand 0 ... Max Int с равномерным распределением
Rand 0 ... Max Int с равномерным распределением
  • www.mql5.com
Потребовалась функция ГСЧ с гнерацией числа Int от 0 до любого значения. Получилась такая функция. Думаю распределение получилось равномерным. На форуме посоветовали другую функцию из статьи. Отбросив лишнее, получилось: Сделал сравнение по скорости обоих функций, оригинальной из статьи и просто MathRand(): Оригинальная из статьи Rnd.Rand_01...
 
elibrarius:

Maybe you can, but I don't think you should throw a part of the future into the train, you can't do that in real trading. Let the future remains unknown in training, as well as in reality.
The test and the exam and have to be better and worse all the time. It's all right. The main thing is to be on the plus side.
Use cross-validation (perhaps there is a built-in) or better yet valedictorian forward.

If you don't do that, and don't do as Maxim did, learning from the end, then we get that we are learning from very outdated data, which is not good. I'm not suggesting we touch the test sample, but I think we can touch the study sample and the control sample. And then, if we've got steady-state performance, then, knowing the baseline model settings, we can learn closer to the edge to apply the model in real life.

I haven't figured out crossvalidation - it needs to be automated there, but I haven't gotten around to it yet.

But nibble tests - yes, this is the right approach, but always history is not enough for me, I use this approach when selecting leaves - there it helps best even on a sample which has already been trained, because usually leaf responses are uneven across the sample, and I need exactly stable.

elibrarius:
Here with these RNGs you can normally mix https://www.mql5.com/ru/blogs/post/735953

By the way, have you seen such generator, which randomly outputs number from array without repeats - I need exactly such a generator.

Reason: