Machine learning in trading: theory, models, practice and algo-trading - page 2586

 
mytarmailS #:

))))

I'm out)

I'll give you a hint: in terms of generalization, nothing changes
 
Maxim Dmitrievsky #:
Logloss shows the amount of mutual information between the traits and the target, as far as I understand it. This is the most objective f-me, without describing the form of dependence. The model is trained to minimize the loss of such information, especially the boosting works that way. What you add on top of the custom one will stop at training.

It seems that logloss is derived based on the maximum likelihood principle for a binomial distribution. In matstat, the principle of maximum likelihood is extended and generalized in the form of M-estimates, which can be some theoretical justification for experiments (but not a guarantee of their success, of course).

 
Aleksey Nikolayev #:

Sort of a logloss is derived from the maximum likelihood principle for a binomial distribution. In the matstat the principle of maximum likelihood is extended and generalized in the form of M-estimates, which can be some theoretical justification for experiments (but not a guarantee of their success, of course).

Confused with cross-entropy probably, but it is more often used for multiclass. In any case, I do not see a problem to make an additional estimation through any of your functions, but purely from the balance graph
 
Maxim Dmitrievsky #:
Confused with cross-entropy probably, but it is more often used for multiclass.

There seems to be something there that the theoretical value of minimum logloss coincides with entropy.

Maxim Dmitrievsky #:
Anyway, I don't see a problem to make an additional estimate using any f-function, but purely from the balance graph

I guess so. Confused only lack of a large number of articles on the subject) Probably, are afraid to reveal the fishy places)

 
Aleksey Nikolayev #:

There seems to be something in there that the theoretical value of the logloss minimum coincides with the entropy.

I guess so. The only confusion is the lack of a large number of articles on the subject) Probably, they are afraid to reveal the fishy places)

Prado on this topic a lot of interesting, on the site of his articles. I put it in a long drawer, but I would like to read it later. One of the most sane authors.)
 
Maxim Dmitrievsky #:
Prado has a lot of interesting stuff on this topic, his articles are on his site. I put them in a long drawer, but I would love to read them later. One of the most sane authors).

Yes, I must have a look at his articles. He has a lot of them, though)

Regarding your idea about combining standard and custom metrics, I remembered an idea with a tree, where it is built by cross-entropy and trimmed by frequency of errors. Probably, instead of frequency you can try to take your metric.

 
Renat Fatkhullin #:
Can you share the information?
1) Do you use the python library of MT5?
2) Do you use it outside or inside MT5?
3) What functions does the library lack? Access to indicators?

We are preparing an upgrade of MQL5 adding fast matrix operations. This will allow to perform massive calculations normally.

We will also develop connectors to analytical packages and implement the standard WinML integration.

1. sometimes.

2. Inside, outside.

3.

Event subscriptions. For the same events as in MQL5 to trigger certain handler methods.

Possibility to write strategies in Python (as a type of application) which could be used in the backtester.

Some ready-made MQL5-Python interaction mechanism. Python can interact with mt5 in both directions, MQL5 can interact with it in both directions, MQL5 is one unit with the terminal and is very good in terms of

It can work with trading functions and similar, but it is not able to work comfortably with industry's top data sensing solutions. Python is the industry standard in data-sense - pandas, numpy, TensorFlow, Keras, PyTorch, etc., but much less "trader" integration into the platform. It would be cool to have standard means of integration of these two powers - something like you hang a Python script in mt5, and the script has a model waiting under steam, or a pool of models, functions engaged in preprocessing data, etc. This script has MQL5 application and strategy, which does its job and calls ML functionality from this script when necessary - quickly and without crutches.

 

Does anyone use the P package for the Quantstrat strategy backtest?

How is it in terms of speed?

 
Aleksey Nikolayev #:

Honestly, I don't understand much. The question is, does the probability change over time? To study this, you can simply build a logistic regression on time (and check the significance of the difference between the coefficient and zero).

If other factors affecting the probability are being studied besides time, they too can be tried to be added to the logistic regression.

elibrarius #:

Or maybe it is easier to make another predictor - the distance of the data line from the current one. Forest can calculate by itself that data older than 8 months is bad for the current forecast. And there would be a simple split: before 8 months (with better leaves) and after 8 months with worse leaves.
Well on a tray they all learn well, of course. On the test/crossvalidation we need to check. But how? It's not clear. It is not even the importance of the predictor, but the importance of the split.

Today I have added such a predictor of the distance from the current bar. You can use the number or just the time. I took the time.

I picked experimentally that the length of history for the training data set of 1 month gives the best forward.

The assumption that adding the distance predictor from the first data line will help was wrong. In practice, the forward only worsened with both 1 month of data and 2 and 10.

Suppose the data of 2 months has been submitted, the tree has found a split of 1 month, and one of its branches has trained on the same data as the experimentally chosen length of 1 month. And the other month also trained on its own data. And it learned well, not bad (which I suggested in the beginning). It will be bad for the forward, and it will simply be learned on the tray. As a result, the model averages the results for both months and the forward is worse than if it was trained only for 1 month.

Conclusion: you cannot feed the global time or line number. Cyclic time variants: day of the week, hour, minute may be useful, but we should check them.
For each target (and/or set of predictors) the length of history for training will have to be selected/optimized.

 

Checked "Cyclic time options: day of week, hour, minute number may be useful - need to check."

Minutes have almost no effect, no more than 0.5% change
Hours and days of the week have an effect. Changes are about 3-5%.

I build 2 models simultaneously: 1 for buy, 2 for sell.
Buy models work better without time by 4-5% and sell models work better with time by the same 4-5%. Like 5% sell on schedule and buy on other principles.

Reason: