How to improve the quality of data in a regular set: I'm not from Oabotschneonsteyan, and - General

Valeriy Yastremskiy 2020.08.09 17:21 #19621

Maxim Dmitrievsky:

several D-neurons (like a grid)

error, % = 45.10948905109489

goodbye )

I sent the author of the grid my cuts and my indignation by mail.

What did you determine? The authenticity of the banknotes?

Maxim Dmitrievsky 2020.08.09 17:22 #19622

Valeriy Yastremskiy:
What did you determine? The authenticity of the bills?

Yes

Valeriy Yastremskiy 2020.08.09 17:27 #19623

Maxim Dmitrievsky:

yes

Damaging logic.

Maxim Dmitrievsky 2020.08.09 17:29 #19624

Valeriy Yastremskiy:
Flawed logic.

There may be some pitfalls. For example you can not use negative values in signs, because he uses binarized in his microtests. There is nothing about this in the scanty description, it does not show errors.

Arrows - Analytical Objects Help-Wanted Index - USA Testing Visualization - Algorithmic

Valeriy Yastremskiy 2020.08.09 17:44 #19625

Maxim Dmitrievsky:

maybe there are pitfalls. For example, you can't have negative values in traits, because it uses binarized ones in its microtests. There is nothing about it in the scanty description, it does not show any errors.

The flaws often occur based on some non-explicit signs. And to detect them is quite a problem in a seemingly correct logic.

Valeriy Yastremskiy 2020.08.09 18:01 #19626

Weights on one side and binarity on the other. That's what we've come from.

Rorschach 2020.08.10 08:35 #19627

Interesting NS approach for Collaborative Filtering

You can take trading tools and strategies instead of people and movie IDs, and some metric instead of grades (expectation, etc.). Then calculate hidden variables for the instrument and strategy. And then everything you want. Select a system for a tool or generate it on the fly with the necessary characteristics, build synthetics for the system....

MQL5 Wizard: Development of Welcome to algorithmic trading Trading Report - Trading

mytarmailS 2020.08.10 09:28 #19628

Maxim Dmitrievsky:
I sent the author of the grid my cuts and my indignation by mail

I wonder what he wrote back.

Maxim Dmitrievsky 2020.08.10 15:41 #19629

mytarmailS:

I wonder what he wrote off.

Nothing so far. There has to be some regularity in the samples, that's the whole point. It's a different approach. I think on regular sets and on should be taught. That is, the lower the entropy in the row, the better the res, and in that dataset the samples are randomly shuffled. In oabotschneonsteyan - it is not so much the pattern that matters, but their sequence

Optimization Types - Algorithmic Equidistant Channel - Channels Equidistant Channel - Channels

Aleksey Vyazmikin 2020.08.11 22:26 #19630

elibrarius:
We mix the cleanest split with the less clean ones. I.e. we worsen the result on the tray, in principle it is not important for us. But also not the fact that it will improve the result on the test, i.e. generalizability. Someone should try it... Personally, I don't think that generalization will be any better than a random forest.

It's much easier to limit the depth of the tree and not do the last split, stopping at the previous one. We'll end up with the same less clear sheet than if we did an extra split. Your option will give something in between whether we did a split or not. I.e. for example you will average a sheet at the 7th level of depth with your method. It will be slightly cleaner than the 6th depth level sheet. I think the generalization won't change much from this, and it's a lot of work to test the idea. You could also average a few trees with depth levels 6 and 7 - you'd get about the same as your methodology.

I probably didn't clarify earlier that there should be at least 1% of the indicators in the sheet on small samples and 100 on large samples, so of course the breakdown won't be to the point of no error in the sheet on any class.

You seem to misunderstand the last step - I see it as a statistical evaluation of the remaining 1% sample - in this sample we observe that the result improves with splits by different predictors, we get subspace information, for example:

If A>x1, then target 1 will be true 40%, which is 60% of the subsample

If B>x2, then target 1 will be correctly defined by 55%, which is 45% of the subsample

If A<=x1, then target 1 will be true 70%, which is 50% of the subsample

Each such split has a significance factor (haven't decided how to count it yet), and the last split has one as well.

and so on, let's say up to 5-10 predictors, then when applying, if we reach the last split, we add up the coefficients (or use a more complicated method of summation), and if the sum of coefficients exceeds the threshold, then the sheet is classified 1, otherwise zero.

A simple way to implement this is to forcibly build a forest up to the penultimate split, and then exclude predictors already selected from the sample, so that new ones would be selected. Or simply after building the tree, filter the sample by sheet and go through each predictor by itself in search of the best split that meets the criterion of completeness and accuracy.

And, the result on the training sample will improve if the other class "0" means no action and not the opposite input, otherwise there can be both improvement and deterioration.

Optimization Types - Algorithmic Moving Average - Trend How to Purchase an

Machine learning in trading: theory, models, practice and algo-trading - page 1963