Machine learning in trading: theory, models, practice and algo-trading - page 3022

 
mytarmailS #:

without understanding what the market is, there won't even be a packet of pasta.

The FF is a garage with Lexuses, I don't know how else to explain it.
 
Maxim Dmitrievsky #:
The FF is a garage with a Lexus, I don't know how else to explain it.

I don't have to explain it to you. I'm a big boy.

You have one opinion, I have another.

My opinion is my experience, you can't change it with words.

 
mytarmailS #:

I don't have to explain. I'm a big boy.

You have one opinion, I have another.

My opinion is my experience, words can't change it.

The sad thing is, it's not an opinion, it's a fact.
You can draw a bald devil instead of FF and make it fit.
 
Maxim Dmitrievsky #:
The sad thing is, it's not an opinion, it's a fact.
You can draw a bald devil instead of a FF and adjust it to fit it

you don't even realise what kind of shit you're saying right now))))

As if the optimiser in mt , is not an optimiser and optimises not FF

 
mytarmailS #:

you don't even realise the shit you're saying right now.)

As if the optimiser in mt, is not an optimiser and optimises not FF.

What does that have to do with mt? Have you ever heard of production optimisation? When you have real interdependent processes, but you need to improve efficiency.

Same thing with sl/tp optimisation for a finished model.

And you're creating a bald devil out of a rubbish heap using FF.

Really, let's finish, it's nonsense, as if I'm talking to a schoolboy.
 

The TORCH book for R is finally out.

If there are any future DL wizards, go for it.

 
Rorschach #:

Have you tried this approach ? (look for the Model Interpretation section about halfway down the page)

Thanks for the link - it will be very useful when I finally start experimenting in python!

I take it this is some new style of presentation of the book? Is there any other material?

To answer the question - when doing leaf culling, nothing directly used.

I did not work with a forest of decision trees, so I did not use a number of techniques in principle as suggested. However, I have used something similar, for example the estimation of the error variance of a particular leaf was used to determine the weight of a leaf in the ensemble.

The importance of predictors on split estimation is also present in CatBoost, but in gradient bousting you need to adjust your understanding of the indicators as the trees are dependent and sequential. The metric itself is quite controversial as it evaluates tree construction and the greedy principle does not work well for all data. However, I used averaging scores from a hundred models over 8 sample intervals to select predictors for CatBoost models - on average this method improved the training results. The experiment was published in detail in this thread.

I have not tried frequency correlation in the proposed version - I invented my own method of grouping binary predictors and leaves, which also allows to discard too similar binary predictors and leaves. I think the python implementation should run faster, as my algorithm is not optimal - should be compared for understanding.

The idea of selecting highly changed predictors seems busy, I should try it. But actually in the experiment I described above, I did it simply by not taking such predictors for the final training. It would be better to understand how to detect the tendency of a variable to change based on its historical behaviour, as well as the moment when fluctuations became irreversibly shifted to a change in the average range of the probability distribution of the predictor. We have ideas on paper - we need to code them.

Evaluating each predictor's contribution to the solution for a particular row as a visualisation is fun, but for a large number of model predictors it is of little use. However, I did a similar thing - I posted a cluster of one here in the thread - where I highlighted the significance of the lith response by colour and how many sheets in the model were used to predict each row. It turned out that most of the leaves cease to be activated in the model, i.e. patterns cease to occur at all - few people even think about it.

Did I miss any ideas voiced there? If yes, please write specifically - I will describe them or not.

I didn't understand about the idea of coding categorical features for use in neural networks - there is a reference to the past material.

 
Slava #:

I get it. Loss functions work with matrices as with vectors. In this place we did not finish (did not add the axis parameter)

That is, in your example you need to solve it line by line.

Thank you for your attention

Okay, I understand. But there is an issue with vectors, in particular with LOSS_BCE:

double bce(const vector &truth, vector &pred)
{
   double sum = 0;
   pred.Clip(DBL_EPSILON, 1 - DBL_EPSILON);
   const int n = (int)truth.Size();
   for(int i = 0; i < n; ++i)
   {
      sum += truth[i] * MathLog(pred[i]) + (1 - truth[i]) * MathLog(1 - pred[i]);
   }
   return sum / -n;
}

void OnStart()
{
   vector actual_values = {0, 1, 0, 0, 0, 0};
   vector predicted_values = {.5, .7, .2, .3, .5, .6};     // 0.53984624 - keras (correct answer)
   Print(actual_values.Loss(predicted_values, LOSS_BCE));  // 0.6798329317196582 - mql5 API
   Print(bce(actual_values, predicted_values));            // 0.5398464220309535 - custom
}

Again the API result does not match the expected result.

 
Aleksey Vyazmikin #:

Thanks for the link - will be very useful when I finally start experimenting in python!

I take it this is some new style of presentation of the book? Is there any other material?

To answer the question - when doing leaf selection, nothing was used directly.

I did not work with a forest of decision trees, so I did not use a number of techniques in principle as suggested. However, I did use something similar, for example, I used the estimation of the error variance of a particular leaf to determine the weight of a leaf in the ensemble.

The importance of predictors on split estimation is also present in CatBoost, but in gradient bousting you need to adjust your understanding of the indicators as the trees are dependent and sequential. The metric itself is quite controversial as it evaluates tree construction and the greedy principle does not work well for all data. However, I used averaging scores from a hundred models over 8 sample intervals to select predictors for CatBoost models - on average, this method improved training results. The experiment was published in detail in this thread.

I have not tried frequency correlation in the proposed version - I invented my own method of grouping binary predictors and leaves, which also allows to discard too similar binary predictors and leaves. I think the python implementation should run faster, as my algorithm is not optimal - should be compared for understanding.

The idea of selecting highly changed predictors seems busy, I should try it. But actually in the experiment I described above, I did it simply by not taking such predictors for the final training. It would be better to understand how to detect the tendency of a variable to change based on its historical behaviour, as well as the moment when fluctuations became irreversibly shifted to a change in the average range of the probability distribution of the predictor. I have ideas on paper - I need to code them.

Evaluating each predictor's contribution to the solution for a particular row in the form of a visualisation is fun, but for large numbers of model predictors it is of little use. However, I did a similar thing - I posted a cluster of it here in the thread - where I highlighted the significance of the lith response by colour and how many sheets in the model were used to predict each row. It turned out that most of the leaves stop activating in the model, i.e. patterns stop occurring at all - few people even think about it.

Did I miss any ideas voiced there? If yes, please write specifically - I will describe them or not.

I didn't understand about the idea of coding categorical features for use in neural networks - there is a reference to the past material.

This is easily automated and works without human intervention

I showed a similar algorithm in the last article.

In essence, it is filtering of model errors and putting them in a separate class "do not trade", better through the second model, which learns to separate grains from chaff.

and only the grains remain in the first model.

It's the same as with tree rules, but from the side. But the rules should be robbed and compared with each other, and there the output is a refined TC.

For example, the first iteration of selecting grains from chaff (to the left of the vertical dotted line - OOS):

And here is the 10th:


 
Maxim Dmitrievsky #:

This is easily automated and works without human intervention

a similar algorithm was shown in the last article.

In essence, it is filtering of model errors and putting them in a separate class "do not trade", better through a second model, which learns to separate grain from chaff

and only the grains remain in the first model

It's the same as with tree rules, but from the side. But rules should be plundered and compared with each other, and there is already a refined TS at the output.

I justified above that you can't discard model errors.

I would like to change my opinion.

But for this purpose it is necessary.

Evaluation of the initial model on the training selection and outside it

Evaluation of a "cleaned" model outside the training selection that does NOT match the previous two models

Can we?

Reason: