Machine learning in trading: theory, models, practice and algo-trading - page 97

You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Keep not trusting importance when using it for forex. Iris is very simple data, there are direct correlations between available data and classes. RF is enough to find a minimal set of predictors on which iris classes can be defined - and you're done.
And what RF can catch non-direct dependencies? It seems to me that for the market it does not work solely because the predictors are rotten, if there were normal predictors it would work as with iris with an error under 95%
With irises it's simple - if a petal is from so-and-so to so-and-so length then it's class 1, and if it's from so-and-so to so-and-so width then it's class 2, etc. What RF does is find intervals of predictor values that best match the target values.
I don't even need forest for this task, one tree is enough for 90% accuracy:
That is, if a certain class corresponds to a certain range of predictor values or their combinations, and these intervals do not overlap - then the tree or the forest will solve everything 100%.
Dependencies on Forex are much more complicated and the forest needs tens of predictors to describe the target values. The forest will surely find such intervals of predictors and their combinations that describe the target values, but they will be simply selected combinations, without any logic or analysis. What the forest will take to decide - noise, or a really important predictor - is a matter of chance. The forest for forex will only work correctly if weed out the unsuitable predictors in advance and leave only the necessary ones for learning. And the problem is that the necessary predictors need to be somehow identified or found, and the forest is not an aid in this.
ForeCA I have not yet been able to.
It took most of the time to sift out predictors with eigenvalue = 0 after cov() of the training table (I assume that only the specially correlated predictors fit). After 24 hours, it came down to training the ForeCA model itself, which didn't train because a bug:
The package is very demanding on predictors, lots of all sorts of restrictions. I don't even know what the last error means, I'll figure it out further.
I'll finish this later:
Google says you don't have to remove the predictors. You can transform them so that they are no longer correlated, then the covariance matrix will have the full rank, which is required for ForeCA. There are some functions for whitening in the package itself (it didn't work right away, you need to figure it out), plus the theory in the links below.
To use the ForeCA package properly you need to first understand and learn how to do it:
http://stats.stackexchange.com/questions/23420/is-whitening-always-good
http://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
1) Forest for forex will work properly only if you preliminarily sift out unsuitable predictors and leave only necessary ones for training. And the problem is that the necessary predictors need to be somehow identified or found, and the forest is not an assistant in this.
2) The package is very demanding to predictors, there are a lot of restrictions. I don't even know what the last error means, I'll keep looking into it.
1) I suggested what i think is a very good idea how to do such a selection, but no one is interested, and i can't implement it myself.
2) I can't do it myself, but I need to reduce the amount of data, otherwise, you know what I mean.)
I've already posted this post, but no one reacted to it.
There is such a notion in BP as dynamic time warping (DTW), you can use it to make the price chart more readable, and therefore more recognizable for the algorithm
and everything seems to be cool, but the sad thing is that as a result of this transformation, we get two coordinates x and y time(synthetic) and values
the question is how to return this transformation to a vector so it doesn't lose its properties
this is what the transformation looks like - top ordinary, bottom dtw
1) I suggested what I think is a pretty good idea how to make such a selection, but no one is interested, and I can not implement it myself.
2) Only reduce the amount of data or you know yourself)
What do you suggest? What did I miss? Can I repeat your idea?
The question is how to return this transformation to a normal vector so that it does not lose its properties
I made one more example with ForeCA, in archive small table for the test and code for work with it.
This time I've got it right.
You can take your own table with training data for model, the main thing it must be matrix, without factors (training with lm, you can only regression). The number of rows should be much greater than the number of predictors, otherwise there will be errors in ForeCA.
Target values I have 0 and 1, with other accuracy will be determined incorrectly, if that adjust the code in RegressionToClasses01(regr) for your case in the place where the regression result is rounded into classes.
trainData - data for training
trainDataFT - data for fronttest
Result:
lm on raw data: 75% accuracy on training data and 57% accuracy on new data.
lm on all 14 components from foreCA: 75% on training data and 61% on new data. A little better, but in this case +4% is only +1 correct result, the table is quite small :)
That is, if the predictors are already pre-selected, then after ForeCA it should not be worse, maybe even a couple of percent accuracy will be added.
I also added a graph with the dependence of accuracy on the number of ForeCA components. It appears that the more components you have, the more accurate your results will be. Maximum allowed number of components = number of predictors.
The second part of the experiment is.
I had 14 previously selected predictors, added another 14 with random values. The maximum allowed number of ForeCA components is now 28.
Prediction accuracy with all 28 components on training data in both cases (with and without foreCA) 76%, accuracy on new data in both cases 57%.
In my opinion foreCA did not cope with garbage in predictors, I did not see the expected miracle.
So what?