Machine learning in trading: theory, models, practice and algo-trading - page 646

 
Dr. Trader:

There are 2 types to choose from -
1=mean decrease in accuracy (that's probably what mda is, it matches the first letters)
2=mean decrease in node impurity

and there are also special packages VSURF, VarSelRF, Boruta.

 
Ivan Negreshniy:

And there are special packages VSURF, VarSelRF, Boruta.

Which one is better? )

 
Maxim Dmitrievsky:

Which one is better? )

So this is only a small part of R that work on random forests, Boruta seems to be for Python as well.

The best ones, IMHO, are the ones that have more variations but less fuss for the user. The best one is on full automatic, analyze the model and look through all the suitable ones.)

 
Ivan Negreshniy:

So this is only a small part of R that works on random scaffolding, Boruta seems to be for Python as well.

I think it's better to have more variations but less complexity for user and the best thing is to use full automatic, analyze model and look through all suitable)

I thought about rewriting some of the scaffolding features for MT5, to have them at hand

I still have not got around to it.

I don't give a damn about the whole bunch of R stuff, you can't learn it in a lifetime... :) and you can't use them all
 
Maxim Dmitrievsky:

Yes, I was thinking to rewrite something from the scaffolding fixture for MT5, to have it handy

I still can't get to it.

I don't care at all about all this stuff from R, you can't learn it in a lifetime... :) moreover, you can't use them all

If you want to write it yourself, then you should start with the classical Breyman's fishimportance - one by one rearrange futures in training set and calculate their importance by change of MSE in OOB or Gini index in tree splits.

In theory it should work for time series, so you can delete the necessary number of less significant elements and reduce to the same dimensionality different in length patterns.
Random forest - Wikipedia
Random forest - Wikipedia
  • en.wikipedia.org
Random forests or random decision forests[1][2] are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random...
 
Ivan Negreshniy:

If you want to write it yourself, then you should start with the classical Breyman ficcimportance - you rearrange futures in the training set one by one and calculate their importance by the change of MSE in OOB or Gini index in tree splits.

In idea it should work for time series, so you can delete necessary number of less significant elements and reduce to the same dimensionality different in length patterns.

Yes, I want Gini for starters.

and in general scaffolding is easier to use, same optimization

 
In my opinion, the price rises more slowly than it falls. I.e. different pictures-images to remember will be. But to try and compare probably not long.
 
elibrarius:
I think the price is rising slower than it is falling. I.e. there will be different images to remember. But it probably won't take long to try and compare.

it's the same in forex :) there are two currencies

I wonder if there are not mutually exclusive examples... in fact there shouldn't be many

 
Maxim Dmitrievsky:

it's the same in forex :) there are two currencies

I wonder if there are not mutually exclusive examples... in fact, there shouldn't be many.

So that's fine. Let N be class 1, and M be class 2, and let the classes overlap, which, imho, should be the case.

Then the probabilities are Pn =n/N and Pm=m/M. If probability>0.5, then DM can handle it by itself. In my experience, the overlap is approximately at the level of 20-40%, i.e. lt 20 to 40% of transactions will be incorrect.

 
Yuriy Asaulenko:

So that's fine. Let N be class 1, and M be class 2, and let the classes overlap, which, imho, should be the case.

Then the probabilities Pn =n/N and Pm=m/M. If probability>0.5, then DM can handle it by itself. Intersection by experience somewhere at the level of 20-40%, i.e., lt 20 to 40% of transactions will be wrong.

Well essentially yes, you can't spoil the mess, and there's less overtraining. And in such, it would seem, little things, may be the key to efficiency

except I don't have classes, I have regression.

And you can also slightly transform the initial series (affine, for example) and put its increments in too (I guess this is what monte carlo is for?).

Anyway, I already like what I'm doing, I have 3 weeks left until my finish with NS :))) either the grail or to hell with it. Place your bets :)))

Reason: