Machine learning in trading: theory, models, practice and algo-trading - page 3510

 
Aleksey Vyazmikin #:
You emphasise the terms but not the content.

One does not realise, or pretends not to realise, that there are NOT terms without content, and content without terms is one. If this correspondence does not exist, then it is just a set of words that have no meaning, i.e. a set of letters with no content.

Looked through the last few pages, very much like specific tolling.

 
I think the essence of Alexey's method is that having found quanta with very bad data, he removes these data from the common set.
Apparently he uses Catbust, in which the algorithm is greedy, but due to data cleaning splits are selected different than before data cleaning. Perhaps they are better. I think this is more of an illusion of a "not greedy" algorithm obtained by preprocessing the data. You can get different tree variants by different data preprocessing/cleaning.


Real non-greedy algorithm apparently can be obtained only by your own tree code, love ready with such a function. There seem to be packages with split selection not by one (current) split, but by checking all splits 1 level deeper. (but maybe these are experimental codes, I think I read about it in a book with the basics of MO). If you have 1000 fics, the speed of calculations will slow down about 1000 times with this. It will turn out to be a 2-level greedy algorithm.

Absolutely non-greedy algorithm will build all possible variants of trees. For example, a calculation with a depth of 6 and with 1000 features will give 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000 = 10^18 tree variants. It is only possible to calculate such an algorithm for 10-20 features in reasonable time.
 
Aleksey Nikolayev #:
As an epigraph, from Goethe's Faust: "Student: Yes, but words correspond to understandings."

I emphasise the correspondence from the epigraph. The following questions are of interest: (1) do you think that your algorithm gives a guaranteed (or at least more probable) global maximum of your custom metrics? (2) If the answer to the first question is "yes", what makes it possible?

If the answer to the first question is "no", you just flush down the toilet the possibility of adequate communication because of arbitrary substitution of concepts.

I see you are an erudite person - my stereotypical thinking paints a picture of a person whose favourite thing to do on public transport is to write down words in a crossword puzzle. Admittedly, the stereotype is probably 20 years out of date by now.

However, let me give a little more information than you wish to know from my reply. I hope that a more detailed answer, will help you understand me better.

Quantisation can make it worse to find an optimal solution, I have shown this in my articles. There are a number of reasons for this. The final result of the algorithm depends on the correct bounds (splits).

I believe that market data contain patterns, while they are interspersed with random events. When using a small number of classes for classification, different patterns are mixed into classes. (Which in my eyes justifies clustering - but that's a separate topic).

My premise is that market history makes it impossible to form a representative sample of all patterns. Split estimation metrics such as the Gini coefficient and entropy in the calculation use the cumulative total of the number of class members in the subsamples before and after the split.

If we agree with these statements, then it is worth assuming that the optimal solution may in fact be obtained on incomplete data, and with some probability be wrong (I don't know how to calculate this probability - any ideas?).

Logically, if we can find a way to reduce the probability of selecting split candidates from unrepresentative data, and even if we follow the greedy principle after selecting them, then the chances of making a correct split (if the observations are representative, then it will be so by summary standard metrics) that tends to split classes on new data with acceptable efficiency.

Therefore, I focus my attention on the metric of the probability of choosing an efficient dual split (quantum split) at each iteration. The final metric for selecting a quantum segment from the selected population for splits and moving to the next iteration may be different, this question is still open to me.

Below I give examples of estimating the probability of selecting an efficient (or stable - as I prefer for understanding) quantum segment on a set of iterations for each class separately. The first graph with underestimated criteria for selection of quantum segments - 300 iterations.

The next graph with higher selection criteria - 50 iterations (faster learning).

We see that in the first case the probability of scoring increases insignificantly, while in the second case it starts from high marks and gradually rolls down after many iterations.

Based on these observations, I conclude that additional quantum segment selection metrics affect the learning result.

Let me remind you that my algorithm was originally created (I just call it "Drop", but "Distillation" is closer in meaning) not for building the final effective decision tree, some market model, but as a way of data exploration, with the help of which it is possible to select more successful splits for quantum tables. A greater effect of the algorithm can presumably be obtained by binarising the selected quantum splits. Despite the one-sidedness of the decision tree construction in this algorithm, we can confidently assert that increasing the number of effective quantum splits on new data, together with improving the method of selecting quantum splits from the obtained set, will lead to a better final result of the model, but Recall will be small.

Each step of the algorithm can be improved, I do not claim that this is the product in its final form - I am still working on it and have tests on many samples planned.

Was I able to answer your question?

 
mytarmailS #:
That the whole point of this clever algorithm is to learn not on ordinary data, but on the centroids of clusters of this data?

And then just select the best rules from the model based on some metric.

Yes I wrote about different algorithms - indeed, there is one in the range. But the last few pages are not about it.

 
СанСаныч Фоменко #:

One does not realise, or pretends not to realise, that there are NOT terms without content, and content without terms is one. If this correspondence does not exist, then it is just a set of words that have no meaning, i.e. a set of letters that have no content.

Looked through the last few pages, very much like specific tolling.

Give me examples of the mismatch between terms and content in my words as well.

Terms help condense information for a narrow group of people from a certain field, but meanings can be different or interpretations can be broadened.

To reduce the influence of terminology, I have already repeatedly described the essence in expanded words.

 
Aleksey Vyazmikin #:

Yes I wrote about different algorithms - indeed, there is one in the range. But the last pages are not about it.

What are the x- and y-axis on the carinae then?
 
Forester #:
I think the essence of Alexey's method is that having found quanta with very bad data, he removes these data from the common set.
Apparently he uses Catbust, in which the algorithm is greedy, but due to data cleaning splits are selected different than before data cleaning. Perhaps they are better. I think this is more of an illusion of a "not greedy" algorithm obtained by preprocessing the data. You can get different tree variants by different data preprocessing/cleaning.


Real non-greedy algorithm apparently can be obtained only by your own tree code, love ready with such a function. There seem to be packages with split selection not by one (current) split, but by checking all splits 1 level deeper. (but maybe these are experimental codes, I think I read about it in a book with the basics of MO). If you have 1000 fics, the speed of calculations will slow down about 1000 times with this. It will turn out to be a 2-level greedy algorithm.

Absolutely non-greedy algorithm will build all possible variants of trees. For example, a calculation with a depth of 6 and with 1000 features will give 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000 = 10^18 tree variants. This can be calculated for 10-20 features in reasonable time.
I also doubt the reality of non-harvested trees in practice.
 
Aleksey Vyazmikin #:

Yeah - normal base case scenario.

+- 1 minute is the time to get such models, on automatic. Not taking into account the time to develop the algo :)

mean reversion clusters work especially well on flat pairs.

There are peculiarities of such a kitchen, for example, on what chips to do clustering.

 
Aleksey Vyazmikin #:

Each step of the algorithm can be improved, I do not claim this is a product in final form - I am still working on it and have tests planned on many samples.

Do you pull out "effective" segments or splits as separate rules later at the final stage, or how do you separate them from the general mass? so that the final model only trades them.

That is, you need to explain it like for morons: you trained the model, defined the segments, then what? Is there a simple logic to reproduce it?
 
Aleksey Vyazmikin #:

I see you are an erudite person - my stereotypical thinking paints a picture of a person whose favourite thing to do on public transport is to write down words in a crossword puzzle. However, the stereotype is probably 20 years out of date.

However, let me give a little more information than you wish to know from my reply. I hope that a more detailed answer, will help you understand me better.

Quantisation can make it worse to find an optimal solution, I have shown this in my articles. There are a number of reasons for this. The final result of the algorithm depends on correct bounds (splits).

I believe that market data contain patterns, while they are interspersed with random events. When using a small number of classes for classification, different patterns are mixed into classes. (Which in my eyes justifies clustering - but that's a separate topic).

My premise is that market history makes it impossible to form a representative sample of all patterns. Split evaluation metrics such as Gini coefficient and entropy in the calculation use the cumulative total of the number of class representatives in the subsamples before and after the split.

If one agrees with these statements, then it is worth assuming that the optimal solution may in fact be obtained on incomplete data, and with some probability be wrong (I don't know how to calculate this probability - any ideas?).

It is logical to assume that if we find a way to reduce the probability of selecting split candidates from unrepresentative data, and even if we follow the principle of greed after their selection, then the chances of making a correct split (if the observations are representative, it will be so by summary standard metrics), which will tend to split classes on new data with acceptable efficiency.

Therefore, I focus my attention on the metric of the probability of choosing an efficient dual split (quantum split) at each iteration. The final metric for selecting a quantum segment from the selected population for splits and moving to the next iteration may be different, this is still an open question for me.

Below I give examples of estimating the probability of selecting an efficient (or stable - as I prefer for understanding) quantum segment on a set of iterations for each class separately. The first graph with underestimated criteria of selection of quantum segments - 300 iterations.

The next graph with higher selection bar - 50 iterations (faster learning).

We can see that in the first case the probability of scoring increases insignificantly, while in the second case it starts from high marks and gradually rolls down after many iterations.

From these observations I conclude that additional metrics of quantum segments selection influence the result of learning.

Let me remind you that my algorithm was originally created (I just call it "Drop", but "Distillation" is closer in meaning) not for building the final effective decision tree, some market model, but as a way of data exploration, with the help of which it is possible to select more successful splits for quantum tables. A greater effect of the algorithm can presumably be obtained by binarising the selected quantum splits. Despite the one-sidedness of the decision tree construction in this algorithm, we can confidently assert that increasing the number of effective quantum splits on new data, together with improving the method of selecting quantum splits from the obtained set, will lead to a better final result of the model, but Recall will be small.

Each step of the algorithm can be improved, I do not claim that this product is in final form - I am still working on it and tests on many samples are planned.

Was I able to answer your question?

The fact that you imagine me as a beggar formalist has no bearing on the need to communicate only within a common conceptual space.

You want to talk about complex concepts without agreeing on simple ones, but it doesn't work that way.

I can share my perception of you as well. You just want recognition for the great work you do. But this forum is a completely unsuitable place for that.

I will write on the substantive part of your comment sometime later.

Reason: