Machine learning in trading: theory, models, practice and algo-trading - page 932

 
elibrarius:
Copy it to your blog, maybe it will be useful for someone else. It's not realistic to look for something here.

There is this code in the blog too. But it's quite basic stuff on "how to make kfold", there's a lot to modify for your specific tasks. Plus if you turn on multithreading in genetics, elmnn will ignore installed gpsh grain and each thread will get something different and unreproducible, for such case (multithreading in genetics) you should see articles by Vladimir Perervenko and his way to control gpsh.

 
Maxim Dmitrievsky:

the extra measurement is still there, and you have to draw a curve through it somehow, maybe with a large error

dropout on the contrary increases the error, no?

Dropout is equivalent to turning off a neuron. If it's noisy, that's good.
And why would they invent some thing that increases the error - everything makes sense, as long as it helps reduce the error.
 
elibrarius:
Dropout is equivalent to shutting down a neuron. If it's noisy, it's good.
And why would they invent some thing that increases the error - everything makes sense, only if it helps to reduce the error.

Ahem... by increasing the error, overtraining seems to be removed. Not always, but how else would it be.

This is why they say that 0.5 is a good shiba for forex. For me at 0.3 0.4 it works more or less, if less then overfit as a rule

I mean, the trick is clear, right? they think they're cool, but in reality they just train the model poorly, and it more or less fails them because it's a semi-random trade

 
Dr. Trader:

In Python it should be too.

I see, python is far from me at all... Doesn't anyone make GUI wrappers to work with NS?

Dr. Trader:

I'll run the algorithm tonight and show what came out tomorrow.

Thank you, it will be informative.

Dr. Trader:

Achieving 100% accuracy and hoping that the model will work well is usually impossible in forex, it takes months to fit the predictors and even the target to come together well. Usually having reached an accuracy of a couple of tens higher than 50% the model starts to memorize training examples instead of finding logical patterns in them. And accordingly the result on new data will be worse and worse. On your data I got this optimum of about 60% - when the result on the training and test data is about the same, and with a more detailed breakdown of the tree and a higher accuracy the forest is showing increasingly worse results on new data.

What's the rush, there is time if it will not be wasted.

However, I do not quite understand the difference between memorize and learn, in my opinion all this MO should find the features in the data set and check the expected result with the memorized result, but with the data deformation, and accordingly depending on the deformation give your forecast. That's the idea I had, until I started poking around with it all.

I'm just surprised that the tree is solved with different sets of predictors, which means that not all the data and knowledge is reviewed when building it, and it's this fact that allows me to make an assumption about the permissibility of further branching.

 
Maxim Dmitrievsky:

Ahem... by increasing the error it seems to remove overtraining. Not always, but how else could it be?

That too. More precisely, this is the main task. I have formed an association of removing neurons with removing inputs. Maybe it's not correct.

Yes, rather dropconnect in the first layer is like removing inputs.

 
elibrarius:

This too. More precisely, this is the main task. I have an association of removing neurons with removing inputs. Maybe it is not correct.

removal of neurons is removal of degrees of freedom = increase in error, coarsening

If you remove a few terms from a regression, it's tantamount to removing inputs, but in a full-linked NS, why

dropconnect, from the name, - yes, it seems

 
Maxim Dmitrievsky:

removal of neurons is removal of degrees of freedom = increase in error, coarsening

if you remove a few terms from a regression, it's tantamount to removing inputs, but in a full-connected NS, why

dropconnect, from the name, - yes, it seems

But also with dropconnect the problem is complicated... For example 50 inputs, 50 neurons.
Removing 1 input removes 50 connections for 50 neurons.
And it is 50 times harder to go through deleting all 50 connections to remove 1 input. And this way to go through all 50 inputs... that's 50^50 variations. Apparently it is a hopeless task. It is easier to deal with inputs - only 2^50 ))).
 
elibrarius:

The question is not just for you, but for everyone.

In practice it is so, i.e. if there are noise predictors then NS cannot get out of 50-55%. If you pick it up, it can even give out 70%.

But why is this so?
After all, the NS should automatically select weights close to 0 for noisy predictors during training (it is equivalent to their exclusion from the selection). We saw that in the problem at the beginning of the branch.
2) If not by training to underestimate weights, then at least dropout should sift them out...

It has been written many times: noise predictors are much more model-friendly - there are always values in the noise that improve the training result. So the process is reversed - the noise predictors get more weight, not as you suggest. This is especially noticeable on small samples, which is less than 1000 observations. Samples over 5000 observations are not so affected, but you still have to pre-screen the noise predictors

 
elibrarius:
But with dropconnect the problem is also complicated... For example 50 inputs, 50 neurons.
Removing 1 input removes 50 connections for 50 neurons as well.
And it is 50 times harder to go through deleting all 50 connections to remove 1 input. And in this way to go through all 50 inputs... Apparently it is a hopeless task. It's easier to deal with the inputs.
Inputs are more important anyway, playing with models is already a shamanism and in theory should not give much gain in general case. Well I made an ensemble of scaffoldings, after dropping individual scaffoldings I can improve by 0.05, sometimes by 0.1 on error (difference between trains and test). This doesn't solve the main problem. In sophisticated neural networks I do not know how.
 
SanSanych Fomenko:

It has been written many times: noise predictors are much more model-friendly - there are always values in the noise that improve the training result. So the process is reversed - the noise predictors get more weight, not as you suggest. This is especially noticeable on small samples, which is less than 1000 observations. In samples of more than 5000 observations is not so affected, but still need to pre-screen the noise predictors

Maybe there is a technology that allows you to prioritize predictors for use in NS/Tree/Forest according to their importance from the point of view of the analyst?

Reason: