Machine learning in trading: theory, models, practice and algo-trading - page 838

 
Mihail Marchukajtes:

In general, I use this particular package to select predictors. It is clear that there are disadvantages, especially the lack of interaction of several predictors in relation to the target. But on the whole it's enough for my optimization so far... So if there are other packages for data preprocessing I'd be glad to consider them...

From my experience I can recommend the RandomUniformForest package as the most comprehensive and professional looking at various aspects of the importance of predictors. Examples can be seen here

Good luck

PS: By the way one of the few that allows you to retrain the model on new data. Which saves a lot of time.

 
Dr. Trader:

A new tour each week. In a week you have to train the model and send them the predictions. But the forward estimate of your model will only be known after another three weeks, your predictions will be compared to the real ones for those 3 weeks.

I think they're keeping at least 90%.

What do you mean "I think they keep at least 90%"? Do you think they trade these forecasts which are sent to them? In general, how can you find out the signs to calculate the forecast in advance? Is it by interpolation? They have it written in their document that it's only "proof of work", like in mining and the winner is also chosen randomly, probably there half of the result is almost the same, then randomly thrown a filter that ranks additionally, well it's a shameful casino, and dataset is probably pure synthetic, noce with low signal premia, there is no market at all. All this stuff about hedge funds etc. is purely to gain popularity for their coin.

 
Maxim Dmitrievsky:

Well, Michael, you came back from your frenzy, soon you will start to assess your ts sensibly and without fanaticism? :)

I did not have any rabies. Only cold calculation, which has not changed since the last time. Not to mention the theory of approach, which still works. For a moment...

 
govich:

It doesn't say exactly how much they're keeping for themselves. All kinds of financial magazines wrote about a profit of 1.5 million in 2016, and if you compare how much of this they paid to participants - it turns out not much.

> Do you think they trade these forecasts that are sent to them?
Yes, that's the whole strategy. For example, I will create a bunch of features, create a training table, send here in the forum, 10 people will give their forecasts and I will use them to trade - everything is simple.
They didn't have their own crypto for a long time, they paid in bitcoins. They just took a few thousand dollars in bitcoins every week for a whole year. Then they released their own crypto so they wouldn't have to deal with bitcoins.

> In general, how can one know in advance the signs for which to calculate the forecast? Is it an interpolation?
Interpolation, prediction by nearest neighbors, clustering, there are many variants, they will not tell you the specific answer, you can only guess.

 
Vladimir Perervenko:

From my experience I can recommend the package RandomUniformForest, as the most fully and professionally address the issue with the various aspects of the importance of predictors. Examples can be found here

Good luck

PS: By the way one of the few that allows you to retrain the model on new data. Which saves a sea of time.

I tried it. I could not get any results...

> ruf <- randomUniformForest( X = x1,Y = y1,xtest = x2, ytest = y2,mtry = 1, ntree = 300,threads = 2, nodesize = 2)

In 5 min it generates.
Error in OOB.votes - Y: Unsimilar multidimensional matrices

Structure of fed matrices:

> str(x1)
num [1:20000, 1:9] 0.00148 0.33309 0.46698 0.26331 -0.05916 ...
> str(y1)
num [1:20000, 1] 0 0 0 0 0 0 0 0 1 1 1 1 ...
> str(x2)
num [1:10000, 1:9] 0.000746 0.162699 0.379051 -0.529729 -0.340744 ...
> str(y2)
num [1:10000, 1] 0 0 1 1 0 0 0 0 0 0 ...

It's not clear - similarity of what with what it requires.
Tried without xtest = x2, ytest = y2 - same result.
Moving on to the next package.

 
elibrarius:

I tried it. I could not get any results...

> ruf <- randomUniformForest( X = x1,Y = y1,xtest = x2, ytest = y2,mtry = 1, ntree = 300,threads = 2, nodesize = 2)

After 5 min it gives out.
Error in OOB.votes - Y :unsimilar multidimensional matrices

The structure of the fed matrices:

> str(x1)
num [1:20000, 1:9] 0.00148 0.33309 0.46698 0.26331 -0.05916 ...
> str(y1)
num [1:20000, 1] 0 0 0 0 0 0 0 0 1 1 1 1 ...
> str(x2)
num [1:10000, 1:9] 0.000746 0.162699 0.379051 -0.529729 -0.340744 ...
> str(y2)
num [1:10000, 1] 0 0 1 1 0 0 0 0 0 0 0 ...

It's not clear - the similarity of what with what it requires.
Tried without xtest = x2, ytest = y2 - same result.
Moving on to the next package.

I don't know why it didn't work, everything works for me.

I got good results in caret. There are three functions for selecting predictors, they have different efficiency, but also consume various computational resources.


There is another very interesting package: CORElearn. That package has two functions for selecting predictors, I used them in pairs, they give very good results on my predictors. Particularly curious is attrEval with an absolutely fantastic set of selection evaluation methods, among which a special place is the Relief group, which evaluates not only one observation (string) but also the nearest strings.


Good luck.


PS.

Don't forget that predictor selection should consist of at least the following steps:

  • selection by: predictors relevant to the target. Wizard here gave a link to the theory of this step. Two ways can be distinguished: statistics and entropy. For both, there was code here
  • selection by enumerated packages which are NOT relevant to the future model
  • Selection which is done on model results. Very effective in linear models. For example, by glm we select only meaningful predictors and then only them to the network. The result may be surprising.


Predictors may require pre-processing before selection, e.g. for centering. This is well described in the articleby Vladimir Perervenko

 
elibrarius:

I tried it. I could not get any results...

> ruf <- randomUniformForest( X = x1,Y = y1,xtest = x2, ytest = y2,mtry = 1, ntree = 300,threads = 2, nodesize = 2)

After 5 min it gives out.
Error in OOB.votes - Y :unsimilar multidimensional matrices

The structure of the fed matrices:

> str(x1)
num [1:20000, 1:9] 0.00148 0.33309 0.46698 0.26331 -0.05916 ...
> str(y1)
num [1:20000, 1] 0 0 0 0 0 0 0 0 1 1 1 1 ...
> str(x2)
num [1:10000, 1:9] 0.000746 0.162699 0.379051 -0.529729 -0.340744 ...
> str(y2)
num [1:10000, 1] 0 0 1 1 0 0 0 0 0 0 0 ...

It's not clear - the similarity of what with what it requires.
Tried without xtest = x2, ytest = y2 - same result.
Moving on to the next package.

Can you post the original sets?

You need to specify that this is not a regression, since your target is not a factor. Add the parameters

ruf <- randomUniformForest( X = x1,Y = y1,xtest = x2, ytest = y2,mtry = 3, ntree = 300,threads = 2, nodesize = 2, regression = FALSE)

or

ruf <- randomUniformForest( X = x1, Y = y1 %>% as.factor, xtest = x2, ytest = y2 %>% as.factor, mtry = 3,

ntree = 300, threads = 2, nodesize = 2)

Good luck

 
SanSanych Fomenko:

There is another very interesting package: CORElearn. There are two functions in that package for selecting predictors, I used them in pairs, they give very good results on my predictors. Particularly curious is attrEval with an absolutely fantastic set of selection evaluation methods, among which a special place is the Relief group, which evaluates not only one observation (string) but also the nearest strings.


I agree. Basically this is probably the most serious package for RF. You should pay attention to developer Marko Robnik-Sikonja.

Good luck

 
Vladimir Perervenko:

Can you post the original sets?

You need to specify that this is not a regression, since your target is not a factor. Add parameters

ruf <- randomUniformForest( X = x1,Y = y1,xtest = x2, ytest = y2,mtry = 3, ntree = 300,threads = 2, nodesize = 2, regression = FALSE)

or

ruf <- randomUniformForest( X = x1, Y = y1 %>% as.factor, xtest = x2, ytest = y2 %>% as.factor, mtry = 3,

ntree = 300, threads = 2, nodesize = 2)

Good luck

That helped. Thank you!
 
Dr. Trader:

It doesn't say exactly how much they keep for themselves. All kinds of financial magazines wrote about 1.5 million profit in 2016, and if you compare how much of this they paid to participants - it turns out not much.

> Do you think they trade these forecasts that are sent to them?
Yes, that's the whole strategy. For example, I will create a bunch of features, create a training table, send here in the forum, 10 people will give their forecasts and I will use them to trade - everything is simple.
They didn't have their own crypto for a long time, they paid in bitcoins. They just took a few thousand dollars in bitcoins every week for a whole year. Then they released their own crypto so they wouldn't have to deal with bitcoins.

> In general, how can one know in advance the signs for which to calculate the forecast? Is it an interpolation?
You can only guess, interpolation, prediction by nearest neighbors, clustering, there are a lot of variants, they won't tell you the exact answer.

1.5m$ is pennies, like for the whole Kantor, I heard that at the moment when they placed their crypto on the exchange, some participants (who was in the top) took each if not millions of dollars, then hundreds of grand, you could get once first place and 4000NMR at $ 200 per coin = 800.000k$, though the ball quickly ran out and NMR collapsed and give coins have become orders of magnitude less, but probably someone was lucky.

IMHO I think that in the beginning they may have tried to trade forecasts, and places there were more or less predictable, probably 90% of the money they paid themselves, most of the first hundred were probably their dudes, so the money did not leak to who the hell knows who. But now it's a pure casino with "proof of work" and a lot of randomness, that's the rumor at least.


PS: before their coin, they used to pay 6k$ a week (but to whom?), that is 288k$ a year, it comes out just the same "honest" ~20% from 1.5m of profits to quants))) But it is clear that all these figures, you can fabricate.

Reason: