Discussion of article "Advanced resampling and selection of CatBoost models by brute-force method" - page 13

[Deleted]  
Evgeni Gavrilovi:

Yes, he is.

It says

#include <MT4Orders.mqh>

#include <Trade\AccountInfo.mqh>

#include <cat_model.mqh>

and the most important thing is that when loading mqh directly from jupyter notebook everything works fine, I was surprised by that

I see... well, something is wrong with the transfer to colab... haven't looked at it yet, I'm busy with another article )
 
Maxim Dmitrievsky:
I see... well, something is wrong with the transfer to colab... I haven't looked at it yet, I'm busy with another article )

I made a recording of my screen, so in colab I load cat_model.mqh


[Deleted]  
Evgeni Gavrilovi:

made a recording of my screen, so in colab I load cat_model.mqh.


And when saving the file on the computer and in the colab the settings of look_back and the list with masks coincide? It should be the same, otherwise the wrong number of features will be saved to the model and there will be an array overrun error like yours
 
Maxim Dmitrievsky:
When saving the file on the computer and in the colab, do the settings of look_back and the list with MAs coincide? It should be the same, otherwise the wrong number of features will be saved to the model and there will be an error of going outside the array, as you have.

Yes, it's a perfect match.

the thing is that without get_prices function the record with an error, maybe the problem is in the test file?

the number of days there is 2 times more compared to the test one, the test one has the last 6 months and the trayning one only has the last 3 months.

[Deleted]  
Evgeni Gavrilovi:

Yeah, it's a perfect match.

the thing is that without get_prices function the record fails, maybe the problem is in the test file?

The number of days is 2 times more than in the test file, the test file has the last 6 months, but the trayning only has the last 3 months.

No, I think it's the parser. Somewhere the number of traits is set incorrectly when the model is saved. I.e. it learns one number, but the parser saves another. We'll figure it out later. I just don't have time to dig.
 
Maxim Dmitrievsky:
No, I think it's the parser. Somewhere the number of features is set incorrectly when saving the model. I.e. it learns one number, but the parser saves another. We'll figure it out later. I just don't have time to pick it up yet.

OK)

 

I solved the issue with loading data into colab by searching all variants.

It is necessary to write pr = pd.read_csv('file.csv', sep=';') directly in the get_prices function and then return this value return pr.dropna()

 

Screwed in a random forest. Starts to work stably with 10000 samples, and 100 trees.

[Deleted]  
welimorn:

Screwed in a random forest. Starts to work stably with 10000 samples, and 100 trees.

It's a curious approach. For balancing classes. Could be played up for our purposes. Just caught my eye.

https://towardsdatascience.com/augmenting-categorical-datasets-with-synthetic-data-for-machine-learning-a25095d6d7c8

Augmenting categorical datasets with synthetic data for machine learning.
Augmenting categorical datasets with synthetic data for machine learning.
  • Egor Korneev
  • towardsdatascience.com
Consider a hypothetical but common scenario. You need to build a classifier to assign a sample to a opulation group. You have a sizable training dataset of one million samples. It has been cleaned, prepared and labeled. The few continuous variables are already normalized, and categorical variables, representing the majority of features, are...
 
Maxim Dmitrievsky:

Karoch I don't know, maybe I have a wrong gmm ))) But I don't see the difference between with it and without it, in my opinion everything is decided by the target and nothing else....


I have 60k in total data.

I take the first 10k and randomly select 500 points from them.

I train the model on them either immediately or train the gmm and then train the model.

I test it on the remaining 50k

And even in the usual way you can find such models as with gmm , and with the same frequency they are genetised.

For example

model without gmm trained on 500 points, test on 50k.


=================================================================================================

I have seen an interesting thing to think about....

There is such a point of view that the market should be divided into states and trade in each state of its strategy, but all known to me attempts were unsuccessful, either the state is not seen or the model trades poorly even in a "kind of one" state.

But in this approach, you can see quite clearly which market the model "likes". and which one it doesn't.

Probably because of the returns from the mashka as signs, the model works better in flat.