Machine learning in trading: theory, models, practice and algo-trading - page 1964

 
Aleksey Vyazmikin:

Earlier I probably didn't explain that there should be at least 1% of the indicators left in the sheet on small samples and 100 on large samples, so of course the split will not be to the point of no error in the sheet on any class.

You seem to misunderstand the last step - I see it as a statistical evaluation of the remaining 1% sample - in this sample we observe that the result improves with splits by different predictors, we get subspace information, for example:

If A>x1, then target 1 will be true 40%, which is 60% of the subsample

If B>x2, then target 1 will be correctly defined by 55%, which is 45% of the subsample

If A<=x1, then target 1 will be true 70%, which is 50% of the subsample

Each such split has a significance factor (haven't decided how to count it yet), and the last split has one as well.

and so on, let's say up to 5-10 predictors, then when applying, if we reach the last split, we add up the coefficients (or use a more complicated method of summation), and if the sum of coefficients exceeds the threshold, then the sheet is classified 1, otherwise zero.


A simple way to implement this is to forcibly build a forest up to the penultimate split, and then exclude predictors already selected from the sample, so that new ones would be selected. Or simply after building the tree, filter the sample by sheet and go through each predictor by itself in search of the best split that meets the criterion of completeness and accuracy.

And, the result on the training sample will improve if the other class "0" means no action and not the opposite input, otherwise there can be both improvement and deterioration.

Anyway, the training result will be something between the tree trained to N and N+1 depths, for example 6 and 7. If at level 6 the error is 20% and at level 7 the error is 18%, your method will give an error between them, for example 19%. Do you think a gain of 1% is worth the time?
Earlier described a simple way - train some trees to level 6 and some to level 7 depth. This would require rewriting the code.
Now I've come up with a simpler way, since you don't have to rewrite anything. Just build a random forest with any package to level 6 and another forest to level 7, then average.


Thetraining sample should not bother us - it is always fine.

 
elibrarius:
All the same, the result of training will be something between a tree trained to N and N+1 depth, for example 6 and 7. If at level 6 the error is 20% and at level 7 the error is 18%, your method will give an error between them, for example 19%. Do you think a gain of 1% is worth the time?
Earlier described a simple way - train some trees to level 6 and some to level 7 depth. This would require rewriting the code.
Now I've come up with a simpler way, since you don't have to rewrite anything. Just build a random forest with any package to level 6 and another forest to level 7, then average.


The training sample shouldn't worry us-it's always fine.

The gain is usually more than 1%.

Of course you can randomize forests, but how do you get them to be the same up to the penultimate split? Suppose you've trained 10 trees to the 6th split, and you train 10 more trees the same way, but to the 7th.

 
Aleksey Vyazmikin:

The gain is usually more than 1%.

Of course you can random forests, but how do you make them be the same up to the penultimate split? Suppose you trained 10 trees to the 6th split, and the remaining 10 we teach the same way, but to the 7th.

No way. That's why they are random, they take random columns for training. Averaging then gives good results.
You can try to set the fraction of columns = 1. That is, all columns would be involved in building the tree, rather than a random 50% of all columns. All trees will be the same, so also set 1 tree in the forest. Total one forest with one tree is trained to 6, the other to 7 level of depth.
If you need more than 2 trees - then independently remove some columns from the set and train additional forests on all remaining columns.

Addendum: the number of rows involved in training should also be set =1, i.e. all so that training is the same. So everything random from random forest is removed.
 

It's a challenge to come up with the trading logic for that ns

so far we have these


plus the ns architecture varies in a wide range, to go through

the main thing is to pick the right one

 
Maxim Dmitrievsky:

It's a real challenge to come up with the trading logic for that ns

so far we have these


plus the ns architecture varies in a wide range, to go over

the main thing is to choose the right reward

Again taken on the NS with reinforcements? They seem to use the reward

 

I suggest testing on this data, there is definitely a pattern there and it is clear what to strive for.

ps remove .txt from the name

Files:
test.zip.001.txt  15360 kb
test.zip.002.txt  13906 kb
 

In 1.5 months. Complete self-learning withoutintervention

I'll do some more digging later... too many parameters

 
Maxim Dmitrievsky:

In 1.5 months. Complete self-learning withoutintervention

I'll do some more digging later... too many parameters

Not bad).
 
Maxim Dmitrievsky:

In 1.5 months. Complete self-learning withoutintervention

I'll do some more digging later... too many parameters

So is this a new trade or how do I understand it?

 
mytarmailS:

So is this on the new trade data or how is this even understood?

You just run it and it trades, it learns as you go
Reason: