Discussion of article "Applying Monte Carlo method in reinforcement learning" - page 3

 
elibrarius:

Look at Vladimir's earlier articles (2nd or 3rd) where in one of the R packages it was determined with the help of scaffolding. It took a very long time to calculate (many times longer than the training of the main NS), whether it was a complete overfitting or some kind of genetics - you should look in the package documentation.
Most likely it was optimised in some way.

Thanks for the information. However, in this article we are talking about independent evaluation of predictors, and it is the group method that is of interest.

 
Aleksey Vyazmikin:

Thanks for the information. However, this article is about the independent estimation of predictors, and it is the group method that is of interest.

Here about the interaction of input variables https://www.mql5.com/en/articles/2029
Оценка и выбор переменных для моделей машинного обучения
Оценка и выбор переменных для моделей машинного обучения
  • www.mql5.com
В статье будут рассмотрены особенности выбора, предподготовки и оценки входных переменных для использования в моделях машинного обучения. Будут рассмотрены множественные методы нормализации и их особенности. Будут указаны важные моменты этого процесса, сильно влияющие на конечный результат обучения моделей. Рассмотрим и оценим новые и...
 
elibrarius:
Here it is about the interaction of input variables https://www.mql5.com/en/articles/2029

Yes, thanks, but it says this about group interaction:

  • Wrapper. Wrapper methods estimate various models using procedures that add and/or remove predictors to find the best combination that optimises model performance. In essence, wrapper methods are search algorithms that treat predictors as inputs and use model efficiency as outputs to be optimised. There are many ways to enumerate predictors (recursive deletion/addition, genetic algorithms, simulated annealing, and many others).

Both approaches have their advantages and disadvantages. Filter methods tend to be computationally more efficient than wrapper methods, but the selection criteria are not directly related to model performance. A disadvantage of the wrapper method is that estimating multiple models (which may require tuning hyperparameters) leads to a dramatic increase in computational time and usually results in overfitting the model.

In the present paper we will not consider wrapper methods, but will consider new methods and approaches of filtering methods, which, in my opinion, eliminate the above mentioned disadvantages.


 
Aleksey Vyazmikin:

In this article we will not consider wrapper methods, but will consider new methods and approaches of filtering methods, which, in my opinion, eliminate the above mentioned drawbacks.


consider these papers as just a textbook on neural networks and on R, there is not a single robust system in there. Many approaches can also be misinterpreted or twisted, it is better to read primary sources. I've already discounted material from a university professor about not trusting the importance of forests: https: //explained.ai/rf-importance/index.html.

At the same time, decorrelation and permutation are reliable and sufficient for the vast majority of cases

Maybe not my question, but I can't bear to watch you suffer :)

Beware Default Random Forest Importances
Beware Default Random Forest Importances
  • explained.ai
0.995 worst radius 0.995 mean perimeter 0.994 mean area 0.984 worst perimeter 0.983 worst area 0.978 radius error 0.953 mean concave points 0.944 mean concavity 0.936 worst concave points 0.927 mean compactness 0.916 worst concavity 0.901 perimeter error 0.898 worst compactness 0.894 worst texture 0.889 compactness...
 
Maxim Dmitrievsky:

consider these papers as just a textbook on neural networks and on R, there is not a single robust system there. Many approaches can also be misinterpreted or twisted, it is better to read primary sources. I have already thrown the material about not trusting impurity importance of scaffolding: https: //explained.ai/rf-importance/index.html.

Maybe it's not my question, but I can't bear to see you suffer :)

That's the thing, that nobody knows how it is right - someone has one thing working on the model, and another one has another, or maybe it's all random, under which everyone is trying to bring a scientific base of evidence and justification.

I am dealing with this myself, and so far nothing like a complete enumeration of predictor groups and evaluation of their models comes to mind to speed up the process. For me, the difficulty is perhaps in preserving the links of large groups for their subsequent splitting into smaller ones in order to speed up the enumeration - this issue is not automated properly.

 
Aleksey Vyazmikin:

1 time to delve into and internalise, because permutation is universal for any MO method, not just RF, and computationally quite cheap

 
Maxim Dmitrievsky:

1 time to get into it and learn it, because permutation is universal for any MO method, not only for RF, and computationally quite cheap

The usual permutation is of course, here is different - we divided the predicts into 9 groups, identified by some method the groups where the result of average models is worse or on the contrary better, and then created new groups with a different division, for example, we beat bad groups into subgroups in order to find the slag for rejection or to understand why it so strongly affects the overall picture and so on round and round. In this way we can identify the best/bad groups of predictors in terms of interaction between them. The task is to automatically make a new breakdown into groups after the model has been classified, taking into account the experience gained, and to train again. The point is that the categorisation is not random.

 
Aleksey Vyazmikin:

There is of course a usual permutation, but here it is different - we split the predictors into 9 groups, identified by some method the groups where the result of average models is worse or on the contrary better, and then created new groups with a different division, for example, we beat bad groups into subgroups in order to find the slag to be discarded or to understand why it affects the overall picture so much, and so on round and round. In this way we can identify the best/bad groups of predictors in terms of interaction between them. The task is to automatically make a new breakdown into groups after the model has been classified, taking into account the experience gained, and to train again. The point is that the categorisation is not random.

There is no interaction between each other, I already wrote above. From rearranging places and changing the number of imports does not change. You can check. Moreover, less weight simply means less contribution to the model, so the bad ones are not even necessary to remove when using the model, but it is desirable to get rid of unnecessary noise

 
Maxim Dmitrievsky:

there is no interaction between each other, I already wrote above. From rearranging and changing the number of imports does not change. You can check it. Moreover, less weight simply means less contribution to the model, so the bad ones don't even need to be removed when using the model, but it is desirable to get rid of unnecessary noise

I have a different conclusion.

 
Aleksey Vyazmikin:

I have a different conclusion.

as you wish :)

But this is trivial stuff. Importance depends on the variance of the fiche (almost always, except for very simple models). Forest doesn't do any transformations on the fiches, doesn't multiply and divide by each other etc, but just scatters their values across nodes, so there are no interactions, only separation