Model struggles with large dataset invariance and training speed

[Deleted] 2024.09.20 08:25 #36221

Over a cup of coffee today.

if I'm not mistaken, I'll go out on the 1st :) I'll sabmise later.

fxsaber 2024.09.20 08:39 #36222

Maxim Dmitrievsky #:

If I haven't got it wrong, I'll go out on the 1st :) I'll sabmise later.

https://hub.crunchdao.com/competitions/causality-discovery/leaderboard

Forum on trading, automated trading systems and testing trading strategies

Machine Learning in Trading: Theory, Models, Practice and Algorithm Trading

Maxim Dmitrievsky, 2024.09.19 14:33

less than a minute in colab, catbust.

If successful, it will be an excellent demonstration of the superiority of quality over quantity.

MetaTrader 5 Python User How to Start with From theory to practice

mytarmailS 2024.09.20 11:05 #36223

Maxim Dmitrievsky #:

Today, over a cup of coffee.

if I'm not mistaken, I'll go out on the 1st :) I'll sabmise later.

It's stressful.

[Deleted] 2024.09.20 11:10 #36224

mytarmailS #:

strains

Sure, but a separate test sample would be hard to fool. They're having some errors on the server today, I can't download. Tension's rising.

mytarmailS 2024.09.20 11:57 #36225

Maxim Dmitrievsky #:

Sure, but a separate test sample would be hard to fool. They're having some errors on the server today, I can't download. Tension's rising.

It's not a big deal.

When you don't realise an idea that will "definitely work" based on previous experience of the contest, but 5 hours to the end of the contest, that's where the tension is)).

Damned Martin Registration for the MetaQuotes-Demo Safe martingale.

[Deleted] 2024.09.20 12:19 #36226

mytarmailS #:

It's no big deal.

When you don't realise an idea that will "definitely work" on the basis of the previous experience of the contest, but 5 hours to the end of the contest, that's where the tension is)).

Found a catch. I trained on parts of the dataset, to test. There everything is good.

But the main dataset is very large, the invariance of features is not enough, training starts to "bump" on the spot. In the end it can't reach the same high speed.

Maybe we can fix it. For example, divide the sample into subdatasets, and then stack the models.

And without it the same magic 0.39-0.40 on validation will be there. Dataset is cool, it screws up in all directions.

Optimisation and Out-of-Sample Testing. Discussing the article: "Time Discussion of article "Gradient

mytarmailS 2024.09.20 13:39 #36227

Maxim Dmitrievsky #:

Maybe we can fix it. For example, divide the sample into subdatasets, and then stop the models.

Well, you can cluster the dataset and train a model for each cluster, although I don't believe this will do anything.

[Deleted] 2024.09.20 13:48 #36228

mytarmailS #:

Well, you can cluster the dataset and train a model for each cluster, although I do not believe that this will do anything

Not today. Yes.

mytarmailS 2024.09.20 13:58 #36229

mytarmailS #:

Well, you can cluster the dataset and train a model for each cluster, although I do not believe that this will do anything

It's a tree, it will make the clusters it needs internally.

Evgeni Gavrilovi 2024.09.20 14:14 #36230

Maxim Dmitrievsky #:

if I'm not mistaken, I'll go out on the 1st.

It is very difficult, there is most likely in the first place the famous expert Alexander Molak, he has a book on this topic

Machine learning in trading: theory, models, practice and algo-trading - page 3623