Machine learning in trading: theory, models, practice and algo-trading - page 1330

 
Aleksey Vyazmikin:

I am more puzzled by another question - why the graphs are very similar in different models on different samples, it seems that the models manage to catch some obvious pattern, which appears with frequent periodicity and on different sample sizes (at least this piece is constantly in the window), and it is this pattern that the model exploits.

For myself I concluded that it is quite possible to distribute from 30% to 70% of the sample from all data in favor of the validation area in search of interesting patterns, but it seems that the optimal one is still 30%.

Maybe because you have the same model but with different seed? ))

if the model is randomized this doesn't mean that the starting value of the generator will strongly affect the result

Normal models won't change much at all, it's completely random. This is just a test for stability.

here all these conclusions could be made without doing anything at all, no experiments, but purely from theory.

30\70 purely randomly obtained results. The conclusion that between 30 and 70 is asymptotically close to 50. Just happened to be such a subsample

 
Maxim Dmitrievsky:

Maybe because you have the same model but with a different seed? ))

If the model is randomized, it doesn't mean that the start value of the generator will strongly affect the result.

Normal models won't change much at all, it's completely random. This is just a test for stability.

Here all these conclusions could be made without doing anything at all, no experiments, but purely from theory.

If you look carefully, you can see that financial results of models in one sample can vary greatly - from 5000 to 1500, i.e. significantly, which means that Seed does affect the models. Let me suggest that it is the selected models that are similar (I'll check), at the same time they have slightly different profit margins, while almost all models are flat in the middle, which is surprising - they are mistaken in the same margins (anomaly in the new data?).

I do not understand the statement "normal models won't change at all, totally random ones will" - the second part contradicts the first one.

Maxim Dmitrievsky:

30\70 purely randomly obtained results. The conclusion that between 30 and 70 is asymptotically close to 50. It's just a sub-sample.

That's the point - random or not, i.e. it depends on the content of the sample in this area or on the amount of data in the samples, that's what we need to understand, what has more influence.

 
Aleksey Vyazmikin:

If you look carefully, you can see that financial results of models in one sample can vary greatly - from 5000 to 1500, i.e. significantly, which means that Seed does affect the models. Suppose the selected models are similar (I'll check), they have slightly different profit margins, while almost all models are flat in the middle, which is surprising - they are mistaken in the same margins (anomaly in the new data?).

I do not understand the statement "normal models won't change at all, they will be absolutely random" - the second part contradicts the first one.

That's the point - random or not, i.e. depends on the sample content in that plot or on the amount of data in the samples, that's what we need to understand, which has more impact.

Models with low error, i.e. qualitative models, are not affected by a change in seed. If you have a Random value around 0.5 you'll have a lot of different models, because you'll overfeed for every sneeze of Random.

 
Maxim Dmitrievsky:

Models with low error, i.e. qualitative models, are not affected by the change of seed. If Random is around 0.5 you will have a lot of different models, because overfits on each and every move of Random.

This is probably true for 99% Accuracy, while my Recall is low - 20% for a good deal, i.e. potentially most of 1's are not detected and there are no inputs, so different models are expected to work in range from 0 to 100 with window of 20%.

 
Aleksey Vyazmikin:

This is with Accuracy 99% probably relevant, and I have Recall low - 20% for good measure, i.e. potentially most of the 1 is not detected and no inputs, so different models are expected to work in the 0 to 100 range with a 20% window.

this is not the way to go, you should reduce the overall model error and not make up a bicycle

all those weird approaches will fall away by themselves.

I wrote 50 times - no need to reinvent the wheel, it's the way to nowhere
 
Maxim Dmitrievsky:

this is not the way to go, you should reduce the overall model error and not invent a bicycle

then all these weird approaches will fall away by themselves.

I told you 50 times, you don't have to reinvent the wheel.

I am listening carefully, what else can you use to reduce the error?

For this purpose, I change sample composition, change settings of model creation - what else can I do?

 

Anyone wondering how seed affects models - took a sample of 30%, all models - animation by clicking on the picture


 
Maxim Dmitrievsky:

this is not the way to go, you should reduce the overall model error and not invent a bicycle

then all sorts of strange approaches will fall away by themselves

I wrote 50 times - no need to reinvent the wheel, it's going nowhere.
I disagree. If standard IO methods worked on the market, everybody would earn money with them.
Aleksey Vyazmikin:

But it's necessary to invent bikes during the daytime. And sleep at night. Save your health.
 
elibrarius:
I disagree. If the standard MO methods worked on the market, everybody would earn money with them.
But making up bikes should be done during the day. And sleep at night. Save your health.

The problem is not in the standard methods, but in the elementary lack of understanding of what you are trying to do with them and what process you are working with

i.e. the lack of both economic and mathematical education

so your actions are like a Brownian particle wandering... maybe this way, maybe that way...

And everyone refuses to read "complicated" books, especially in English.

 
Maxim Dmitrievsky:

The problem is not in the standard methods, but in the elementary lack of understanding of what you are trying to do with them

A prime example of stupidity is the zigzag output.

Reshetov's nuclear machine is the same bicycle that some here use. It seems to be more successful in dealing with the market than something standard.

So I'm for bicycles! ) But, of course, you need to understand what to do with them, too.

Reason: