Machine learning in trading: theory, models, practice and algo-trading - page 1620

 
Aleksey Vyazmikin:

What do we input for clustering - all sample predictors or what?

Well, in my opinion yes... but you can experiment, too.

 
Aleksey Nikolayev:

I will take the idea to some logical conclusion. Suppose we have a set of systems on the same asset. Each system, when in the market, holds a position of a fixed volume, but the direction can change. The yields and volatilities of the strategies are known. Let us define the correlation between the strategies by the formula (t1-t2)/sqrt(T1*T2), where T1 and T2 are the duration of their time in the market and t1 and t2 are the duration of time when these strategies are simultaneously in the market and directed in the same and opposite directions respectively. This is a simplified formula derived under the assumption of price proximity to the SB. Now there is all the data to apply Markowitz theory to find an optimal portfolio.

Obviously, we will not get any meaningful portfolios this way (at least, because only one asset is used). We need some modifications.

1) To change the optimization algorithm (parameter limits, penalties). Clarify the definition of the correlation between strategies.

2) Optimize the portfolio already at the moment of strategy creation. That is, look for strategies based on the portfolio optimality condition. It is not quite clear how this can be formalized in a practically applicable way, but the approach seems more logical in general. Although, as you have already written, the algorithms will need to be rewritten, etc. etc. Not that it's worth the trouble.

All true what you say. But I wouldn't take into account only market entry time - we need some measure of performance during this period, because it is not enough to enter the market, but we must also exit in time, and here conditionally two similar strategies, but one with a fixed take and the other without, will give high time correlation, but different financial results. Two similar strategies can be useful, for example, one of them will give profit in flat conditions and the other one will give more profit when trend is going on which will make the balance curve smoother.

If there are not too many strategies, their direct search and joint estimation is possible.

 
mytarmailS:

Well, in my opinion, yes... but we can also experiment

And what algorithm should I choose to get an acceptable result without randomization and with a reasonable calculation time? I'm not very good at clustering.

 
Aleksey Vyazmikin:

I studied CatBoost, so I will talk about it.

The depth of the tree is recommended 4-6 splits. This is the depth I am trying in general.

Predictor splitting is done by three different algorithms to choose from. The so-called grid is created.

The results of splitting are interesting to pull out and see for yourself. And what does AlgLib divide predictors equally when building a tree for a forest?

AlgLib divides the incoming piece by the median (correction - not by the middle, but by the median. ). That is, if 100 examples came in, it sorts the values and divides by the value from the 50th example. There's a quantile option in the code, but it's not used.

I remembered about XGBoost, there is an option of random division. It seems to be the same in catbust.

In general it's strange that they recommend such shallow trees.
As I've written before, with such a shallow tree, it's unlikely that a sector like 20 to 30% of the value will be chosen. At best, it will divide 1 or 2 times by medians or by random values in Boosts.
If the depth were 100, we could well get to the 20-30% sector by some predictor.

I assume that in Boosts this is compensated for by the large number of refinement trees, which can use other predictors that were not used in the main tree, but they too will only be separated 1 -2 times.

 
Aleksey Vyazmikin:

And what algorithm should I choose to get an acceptable result without randomization and with a reasonable calculation time? I'm not very good at clustering.

Yes in a trailer it is possible any (knn,som,dtwclust...), the best variant certainly will show experiment...

Don't get me wrong, I'm not using what I wrote about, I just read your idea and looked at it from a slightly different side of implementation, so I spoke out... I don't guarantee any result ...

 
elibrarius:

Alglib divides the incoming piece by the median. That is, if 100 examples came in, it sorts the values and divides by the value from the 50th example. There is a variant by quantiles in the code, but it is not used.

I remembered about XGBoost, there is an option of random division. It seems to be the same in catbust.

In general it's strange that they recommend such shallow trees.
As I've written before, with such a shallow tree, it's unlikely that a sector like 20 to 30% of the value will be chosen. At best, it will divide 1 or 2 times by medians or by random values in Boosts.
If the depth were 100, we could well get to the 20-30% sector by some predictor.

I assume that in Boosts this is compensated for by the large number of refinement trees that can use other predictors that were not used in the main tree.

The reality may not be what we imagine - we should try to reproduce the separation algorithms from CatBoost and see what really happens there and how correct it is.

Regarding the randomness - there is a randomness on the choice of partitioning the predictor grid, like not the best, but random, if I understand correctly. And, there are algorithms that make the stack unevenly divided by ranges.

 
mytarmailS:

Yes in the trailer can be any (knn,som,dtwclust...), the best variant of course will show experiment ...

Don't get me wrong, I don't use what I wrote about, I just read your idea and looked at it from the other side of realization, so I spoke out... I'm not guaranteeing any result.

Am I talking about guarantees - just curious to understand your idea.

 


The Neuro indicator is almost ready)) This is an Expert Advisor, purely an indicator I did not have enough qualification.
 
The orange area at the top - predicts a downward movement, the green area at the bottom - an upward movement, the thickness of the degree of confidence of the neural network. It works only on BTCUSD M1 (yet...).
Is it cool? ))
 
Evgeny Dyuka:
If I look at the grid, I don't know if the green zone above predicts downward movement or upward movement below the green one. It works only on BTCUSD M1 (for now).
Is it cool? ))

I would say not bad, but it's frustrating.

In fact, it works as an ordinary indicator in the overbought/oversold area.

Sometimes it's right, sometimes it's wrong, it shouldn't be like this...

Have you tested this net for trading at all? My experience tells me that it will not make money...

Unless you put a filter on network "confidence".

Reason: