Machine learning in trading: theory, models, practice and algo-trading - page 1373

 
Elibrarius:
I used to write Darch on R. I found a couple of bugs and described them in comments. After a couple weeks of silence this Darch turned out to be in the CRANa archive.
I asked the developer to fix and add some fixes, he did it. And then completely rolled back to the original version, erasing all the fixes. As a result, all modifications that I used were not available.
Conclusion - or do everything yourself, or use top products with very good support.

1.You need to make a fork and make all the changes yourself. They may or may not be accepted, but you can use your version from your GitHub.

2. Of course, this is the most reliable option.

 
Maxim Dmitrievsky:

For those who strive for the complex but don't understand how beautiful the simple can be

And English, of course, as requested. No, I will not translate it yourself. In the video there is a link to the site where you can translate the articles.


This very interesting report is not just about the simple and the complex. The emphasis is on the fact that using complex preprocessing you can reduce the solution of a problem to simple models. Just a confirmation of a simple truth that I never tire of repeating in my articles: "The main effort should be devoted to preprocessing predictors, models are secondary.

The speaker is hilarious.

Good luck

 
Vladimir Perervenko:

This very interesting talk is not just about simple and complex. The emphasis is on the fact that by using complex preprocessing one can reduce the solution to simple models. Just a confirmation of a simple truth that I never tire of repeating in my articles: "The main effort should be devoted to preprocessing predictors, models are secondary.

The speaker is hilarious.

Good luck

XGBOOST has an input array weights with string weights. Some other packages also have it.
I thought that I could write there string weights from 1 (for fresh strings) to 0.5 for old strings. This would increase the impact of new data.
I tried it and did not notice any particular improvement.

Has anyone else tried it - any improvement?

 
elibrarius:
XGBOOST has an input array weights with row weights. Some other packages have it too.
I was thinking, that I could write there weights from 1 (for fresh) to 0,5 for old strings. This would increase the impact of new data.
I tried it and did not notice any particular improvement.

Has anyone else tried it - any improvement?

It's a little wrong. You have for example train[2000, ] and test[500, ]. You train on train with initial example weights = 1.0, make test[] predicate a trained model. Based on the quality of each test prefix you give it a weight. Then combine train and test and form a new training sample, train the model, test it and so on until the whole training sample has weights obtained this way. You can apply a reduction factor to them for older bars, but I haven't checked it. All this is for classification, of course.

now_train <- rbind(train,test)%>% tail(dim(train)[1])

I checked it with ELM, it gives good results.

Good luck

 
Vladimir Perervenko:

That's not quite right. You have for example train[2000, ] and test[500, ]. Train on nrain with initial example weights = 1.0, make the predicate test[] a trained model. Based on the quality of each test prefix you give it a weight. Then combine train and test and form a new training sample, train the model, test it and so on until the whole training sample has weights obtained this way. You can apply a reduction factor to them for older bars, but I haven't checked it. All this is for classification, of course.

I checked it with ELM, it gives good results.

Good luck

It's like cross validation - divide data into 5-10 parts and put the weights of some part of the rows in each cycle, until all of them are set. I think you should do 2-3 full circles, for balance.

Reminds me of several iterations as in self-study to set the best row weights.
 
elibrarius:
It's like cross validation - divide data into 5 to 10 parts and put weights of part of rows each cycle, until all of them are set. I think we should do 2-3 full circles, for balance.

Reminds me of several iterations as in self-study to set the best row weights.

You can check with the crossmatch.

 
elibrarius:
XGBOOST has an input array weights with row weights. Some other packages have it too.
I thought, that there weights from 1 (for fresh strings) to 0.5 for old strings could be written there. This would increase the impact of new data.
I tried it and did not notice any particular improvement.

Has anyone else tried it - any improvement?

well, only train for new ones then. These weights are for model variance alignment on the dataset, in logit regression with variable variance is also used (if I'm not confused about what we're talking about)

No conceptually significant improvements, other than the dataset fit, should give

If you need a reliable generalization for the general population on a small subsample, these are Bayesian approaches
 
elibrarius:
XGBOOST has an input array of weights with row weights. Some other packages have this too.
I was thinking, that I could write there weights from 1 (for fresh) to 0,5 for old strings. This would increase the impact of new data.
I tried it and did not notice any particular improvement.

Has anyone else tried it - is there any improvement?

According to the idea these weights will affect the construction of the first tree, i.e. almost the same seed and bugging, different techniques. Theoretically, the result could change a lot if you move the well -separated sample predictors to the background in those rows where they give the correct classification.

Isn't it possible to set the predictor to apply only starting from X split? I think this is a very useful thing to find a good model.
 
Maxim Dmitrievsky:

well, and train only for new ones then. These weights are used in logit regression with variable variance (if I am not confused about what we are talking about)

Any conceptually significant improvement, except for dataset fitting, shouldn't give

if you need a reliable generalization for the general population over a small subsample, these are Bayesian approaches

So the alignment is chosen by the method that Vladimir suggested?

 
Aleksey Vyazmikin:

According to the idea these weights will affect the construction of the first tree, i.e. almost the same seed and bugging, different techniques. Theoretically, the result could change a lot if you move well-separated sampling predictors to the background in those rows where they give the correct classification.

Isn't it possible to set the predictor to apply only starting from X split? I think it's a very useful thing in searching for a good model.

These weights can be fed not only into the boosting, but also into the NS. Apparently the methodology is common to all MO systems.
The first experiment with decreasing the influence of old data showed no improvement.

The test looks better when training on 30000 lines than when training on 80000. At 80000 both the transactions are smaller and the error is higher. I tried to decrease weight proportionally (from 1 for fresh to 0.5 for old) - the results are almost the same.


Apparently this is still to even out the variance, as pointed out by Maxim, by the method laid out by Vladimir.

Reason: