Machine learning in trading: theory, models, practice and algo-trading - page 2067

 
Aleksey Vyazmikin:

This is correct, but if you remove those strings, which are already a lot in the leaves, there will be a little less (class "0"), and the quality should not fall, while the relative indicators for "1" will be more, and thus the model will be able to take into account in the search those options leaves, which previously statistically was not correct to do.

Another option is to remove the unique leaves, they just might interfere with learning.

If there are a lot of similar strings, then the situation is often repeated. If you take them out, for example Grade 0, and start to activate a leaf with Grade 1 in a rallies, then if the patterns do not change, then you will again get a lot of 0's instead of the predicted 1. And suffer losses. What's in it for you?

MO can only predict on the assumption that the pattern will hold and act as it has in the past. By removing patterns from training, you will get random

 
Aleksey Nikolayev:

Perhaps it is easier to explain by means of an application example.

Point 1) - check of the hypothesis about the presence of diurnal fluctuations of persistence-antipersistence. It is a check of tendency of price to continue or vice versa - to change its direction depending on time of day. You need to know the correlation for that.

Points 2) and 3) - testing the hypothesis that price reversals "happen by the hour" and it is better to do it at the "right" time.

Point 3) - searching for the flat (trendy) moments of the time of day by studying the empirical distribution of zigzag lengths.

1) Correlation between what and what?

 
Valeriy Yastremskiy:

I understand about the day, I agree. The question is about the day of the week. The time within the averaging period, twenty-four hours, is not initially tied to the day of the week. To reveal the intra-weekly repeatability, taking into account the time of day, you can initially make a link to the day of the week. You have a linkage only to the time of day of the day of the month.

Of course, you can, by analogy with minutes within day to do (for example) with five minutes within a week, but then you have to separate the purely weekly recurrence from inherited from the daily, as well as from the news effects and other non-stationarity. Anyway, I'm not so sure about the weekly seasonality being as pronounced.

 
elibrarius:

1) Correlation between what and what?

Adjacent increments.

 
Aleksey Nikolayev:

We could, of course, by analogy with minutes within a day (for example) with five minutes within a week, but then we would have to separate the purely weekly periodicity from the inherited daily periodicity, as well as from the news effects and other non-stationarity. Anyway, I'm not sure that the weekly seasonality is as pronounced.

Apparently I'm being obtuse. I don't consider weekly averaging. How do you get the answer, a periodic event every day at 9 am, or every third, fifth and tenth day of the month, or every Wednesday at 9 am?

 
Valeriy Yastremskiy:

Apparently, I'm not very articulate. I don't consider weekly averaging. How do you get the answer, a periodic event every day at 9 a.m., or every third, fifth, and tenth day of the month, or every Wednesday at 9 a.m.?

I think I understood you and wrote in the spirit that an event repeating every day at 9am would also be an event repeating every Wednesday at 9am. It would be quite difficult to isolate those events that have a REALLY WEEKLY (but not daily) period because of the very bright diurnal periodicity. Of course, I could be wrong, but I haven't noticed a clear weekly periodicity yet, so there is no way to identify it in my code.

 
elibrarius:

If there are a lot of similar lines, it means that the situation is often repeated. If you throw them out, for example class 0, and start activating a sheet with class 1 in a ral trade, then if the patterns do not change, then you will again get a lot of 0, instead of the predicted 1. And suffer losses. What's in it for you?

MO can only predict on the assumption that the pattern will hold and act as it has in the past. By removing patterns from the training, you get random

There are strategies, trend strategies that earn well at 40% accuracy, but standard MO methods do not allow to train them, I dump class "1" to zero if accuracy is not enough, and I just need such splits to separate and improve, so I'm looking for such methods. Otherwise Recall very small at 1 gets.

 
Aleksey Nikolayev:

Intraday volatility fluctuations interfere with the search for intraday patterns. We need to get rid of them somehow. Possible ways:

1) Re-adjusting increments to account for intraday time volatility.

2) Switching to a new intraday time, in which the variance grows evenly.

3) Using a zigzag. The values of the knees do not depend on volatility fluctuations. The time tops of course depend on volatility (they are more frequent where it is high), but when passing to a uniform time these clusters disappear.

This is in theory... but in practice, no matter how you twist the glasses... )

I normalized to the volatility of increments, equalized the variance. Only information loss occurred.

 
Aleksey Vyazmikin:

There are strategies, trend strategies that earn well at 40% accuracy, but the standard methods of MO do not allow to train them, dumping the class "1" to zero if the accuracy is not enough, and I just need such splits to separate and improve, so I'm looking for such methods. Otherwise Recall very small at 1 gets.

If it's a trend, the TP is big, but the SL is small. For example 500 to 100. In this case if the mistake is 80%, there will be 20% of successful and 80% of losing trades. The balance will be near zero. If you trade leaves with an error of 70%, you will already be in profit. And if you find 50/50, the profit will be huge.


What do you mean dump? 70% of errors it only seems to be dumped to class 0. For the remaining 30% of class 1, you can already make money.

 
Aleksey Vyazmikin:

Maxim, I have a suspicion that the model for C++ is not correctly unloaded from CatBoost - can you compare it with the model for python?

The indicators of model interpretation in MQL5, where the values are taken from the CPP model, and the values from the binary model do not coincide. The delta is around 0.15 which is a lot.

The python one is cpp in a wrapper. Everything works fine.

I mean it can be saved both in python format and in cpp. I save it in cpp and then convert it to mql by simple actions, since model itself is several arrays.
Reason: