How to build a model on more recent data? - General

Aleksey Vyazmikin 2023.10.06 09:49 #32841

Aleksey Vyazmikin #:

And here's the result - the last two columns

Indeed, the results have improved. We can make an assumption that the larger the sample, the better the training result will be.

It is necessary to try to train on 1 and 2 parts of the training sample - and if the results are not much worse than on 2 and 3 parts, then the factor of sample freshness can be considered less significant than the volume.

Well - the training is finished, the results are below in the table - the last two columns.

We can tentatively conclude that indeed, the success of training depends on the sample size. However, I note that the results of the "-1p1-2" sample are comparable, and even better by some criteria, with the"-1p2-3" sample, while for the "0p1-2" sample the results are twice as bad in terms of the number of models meeting the given criterion.

Now I have run a sample with inverted chronology, where the train sample consists of the initial exam+test+train_p3 sample, and the test sample is train_p2, and exam is train_p1. The goal is to see if it is possible to build a successful model on more recent data that would have worked 10 years ago.

What do you think will be the result?

Is there a pattern Neuromongers, don't pass by Currency Strength

СанСаныч Фоменко 2023.10.06 11:09 #32842

Aleksey Vyazmikin #:

Well - the training is over, the results are below in the table - last two columns.

We can tentatively conclude that indeed, the success of training depends on the sample size. However, I note that the results of the "-1p1-2" sample are comparable, and even better by some criteria, with the"-1p2-3" sample, while for the "0p1-2" sample the results are twice as bad in terms of the number of models meeting the given criterion.

Now I have run a sample with inverted chronology, where the train sample consists of the initial exam+test+train_p3 sample, and the test sample is train_p2, and exam is train_p1. The goal is to see if it is possible to build a successful model on more recent data that would have worked 10 years ago.

What do you think will be the result?

A little bit more and the most trivial result will be obtained ... or maybe it will not be obtained, but then a discovery that will turn the world of ME upside down!

Way to go!

mytarmailS 2023.10.06 11:10 #32843

СанСаныч Фоменко #:

I have written many times about the "predictive power of predictors". which is calculated as the distance between two vectors.

I came across a list of tools for calculating the distance:

This is besides the standard one, which has its own set of distances

Nice build

Maxim Dmitrievsky 2023.10.06 11:19 #32844

Here's a task with no input: ...

What do you think the result will be? 😀

Just like before: here are the values of the features without the features themselves.....

And then he will write: and no one guessed, the result is like this 😁😁😁😁🥳

Is the advisor ordered Multi Time Frame Backtest Thoughts on the random

Andrey Dik 2023.10.06 11:51 #32845

Maxim Dmitrievsky #:
Here's a task with no input: ...

What do you think the result will be? 😀

Just like before: here are the values of the features without the features themselves.....

And then he will write: and no one guessed, the result is like this 😁😁😁😁🥳

Max, I don't understand why you're making fun of me.

If there are no assumptions - don't say anything, if there are - say it, like "the result will suck".

Andrey Dik 2023.10.06 11:51 #32846

Aleksey Vyazmikin #:
...

What do you think the result will be?

I don't know, but I'm curious to know.

Aleksey Vyazmikin 2023.10.06 12:03 #32847

СанСаныч Фоменко #:

Just a little more and a trivial result will be obtained ... or maybe not, but then a discovery that will turn the world of ME upside down!

Way to go!

So you think that the number of models will be comparable in the first two columns? Even though they are twice as different. Be more specific about triviality, please.

Maxim Dmitrievsky 2023.10.06 12:06 #32848

Andrey Dik #:

Max, I don't understand why you're making fun of me.

If there are no assumptions - don't say anything, if there are - say it, like "the result will suck".

I wrote above about matstat. Before that I wrote about kozul. Even before that I wrote about Oracle errors (markup errors), when data is marked up in a way you don't understand. What absolutely comes out of this is the realisation that on different chunks and lengths of training, the results will vary. Depends on the data, which is not provided or described.

Markup errors affect results and time periods. Which chicken used which paw to mark with which paw will be the chicken's result.

People here like to talk about the basic pillars of learning: preprocessing, quantisation, relation of perdictors to targets.... But they don't write about which paw is used for marking, left or right. More depends on it than all of the above.

Question about my EA Not the Grail, just "New Neural" is an

Andrey Dik 2023.10.06 12:22 #32849

Maxim Dmitrievsky #:
Above I wrote about matstat. Before that I wrote about kozul. Even earlier I wrote about Oracle errors (markup errors), when the data is marked up in a way you don't understand. What absolutely comes out of this is the realisation that on different chunks and lengths of training, the results will vary. Depends on the data, which is not provided or described.

Markup errors affect results and time periods. Which chicken used which paw to mark with which paw will be the chicken's result.

People here like to talk about the basic pillars of training, preprocessing, relation of perdictors to targets.... But they don't write about which paw is used for marking, left or right. It depends on that more than all of the above.

well, that already sounds like a pro's opinion (whether it's right or wrong is another question).

and there's no need to make fun.

Maxim Dmitrievsky 2023.10.06 12:24 #32850

Andrey Dik #:

Well, that already sounds like a pro's opinion (whether it's right or wrong is another question).

and there's no reason to make fun of it.

It's absolutely correct, and that's the biggest problem. Because of the lack of resources for proper markup (usually the most expensive), they even invented active learning. When algorithms themselves try to truthfully mark up a dataset + the help of annotators. In our case, to buy or sell.

And then they fiddle with their own markup errors. It's just obvious.

Why would one share Any questions from newcomers Combining trend and flat

Machine learning in trading: theory, models, practice and algo-trading - page 3285