Machine learning in trading: theory, models, practice and algo-trading - page 2066

 
Aleksey Vyazmikin:

I don't know how to explain :)

I took this function

Try XOR
 
Aleksey Nikolayev:

Intraday volatility fluctuations interfere with the search for intraday patterns. We need to get rid of them somehow. Possible ways:

1) Re-adjusting increments to account for intraday time volatility.

2) Switching to a new intraday time, in which the variance grows evenly.

3) Using a zigzag. The values of the knees do not depend on volatility fluctuations. The time tops of course depend on volatility (they are more frequent where it is high), but when passing to a uniform time, these clusters disappear.

Is point 1 enough? Or does 2 have to be too? What is it? Explain.
 
elibrarius:
What's a pre-analysis? You input models and compare - with and without this feature.

Data analysis via matstat, and in a narrower sense, finding useful differences in price from SB. The main difference from MO is that the models are formulated explicitly and with a small set of parameters.

 
elibrarius:
Try XOR

I'll try different ones, maybe later. In general, I think we should do clustering and pull from similar clusters one string at a time, otherwise the quality of learning drops all the same.

 
elibrarius:
Is point 1 enough? Or do you need 2 as well? What is it? Explain.

Perhaps it's easier to explain through an application example.

Point 1) - check of the hypothesis about the presence of daily fluctuations of persistence-antipersistence. It is a check of propensity of price to continue or vice versa to change its direction depending on the time of the day. For this you need to know the correlation.

Points 2) and 3) - testing the hypothesis that price reversals "happen by the hour" and it is better to do it at the "right" time.

Point 3) - searching for the flat (trendy) moments of the time of day by studying the empirical distribution of zigzag lengths.

 
Aleksey Nikolayev:

Perhaps it is easier to explain by means of an application example.

Point 1) - check of the hypothesis about the presence of diurnal fluctuations of persistence-antipersistence. It is a check of tendency of price to continue or vice versa - to change its direction depending on time of day. You need to know the correlation for that.

Points 2) and 3) - testing the hypothesis that price reversals "happen by the hour" and it is better to do it at the "right" time.

Point 3) - searching for the flat (trend) moments of the time of day by means of studying of empirical distribution of zigzag lengths.

In a couple of months there is a binding to the day of the month only or to the day of the week as well. For the code, only to the time within the studied period.

 
Aleksey Vyazmikin:

I'll try different ones, maybe later. In general, I think I should do clustering and pull from similar clusters one line at a time, otherwise the quality of training will fall.

Every leaf in the tree is a cluster. And not just in terms of number of features, but also in terms of the best separation of classes
 
elibrarius:
Every leaf in the tree is a cluster. And not just in terms of the number of features, but also in terms of the best division of classes

This is correct, but if you remove those strings, which are already a lot in the leaves, there will be a little less (class "0"), and the quality should not fall, while the relative indicators for "1" will be more, and thus the model will be able to take into account when searching for those versions of leaves, which previously statistically was not the right thing to do.

Another option is to remove unique leaves, as they can interfere with learning.

 
Valeriy Yastremskiy:

In a couple of months, the binding only to the day of the month or the day of the week is also there. For the code, only the time within the studied period.

For periods longer than a day, there is a problem of separating the influence of the news background from the periodicity. I do not really understand how it can be solved.

 
Aleksey Nikolayev:

For periods longer than a day, there is a problem of separating the influence of the news background from the periodicity. I do not really understand how it can be solved.

I understand about the day, I agree. The question about the day of the week. The time within the averaging period, days, is not tied to the day of the week initially. To reveal the intra-weekly recurrence, taking into account the time of day, you can initially make a link to the day of the week. You have only linkage to the time of day of the month.

Reason: