I keep telling them that predicting the market for 5 years is not realistic...
But that's not the point. The point is that in jPrediction since version 7 there is a possibility to assess the significance of predictors. To do this, after creating (training) a new model or loading a previously saved model from a file, you need to call the menu item "View a significant of predictors" or press the "hot" key F5:
And you can look at the table of predictor significance:
The best predictor is the best - the most significant predictor. If you remove the "Competitiveness" column from this sample, you get the message "Garbage in, Garbage out" after training.
The worst predictor is the worst - the least significant predictor. If you remove "Operating risk" column from this sample, the generalization ability will not get worse.
The remaining predictors marked as "-" in Description are of average significance. If you remove them from this sample, the generalization ability will noticeably deteriorate.
How is the significance of predictors calculated?
In very short (but not very clear), predictor significance is calculated by weighting coefficients obtained after training.
You can see a more detailed algorithm for calculating predictor significance in the jPrediction source code. Well, or I will have to write a whole article to explain it more clearly.
Thank you!!! Extremely useful addition. Continuing to twist.... to twist...
The main thing is that now you can very quickly calculate insignificant predictors and replace them with other predictors. After replacement it is necessary to watch: did the generalization ability increase or not? If it didn't increase, then the replacement was done incorrectly, i.e. a more significant predictor was replaced with a less significant one.
Yesterday I experimented with quotes. Quickly found the most significant TA oscillators. But there were only 5 of them. And further the generalizing ability does not grow, I don't care what you put in it. I.e. it turns out that whatever you do with TA indicators and oscillators, but in fact they all are based on the same data - a small segment of the previous history (several bars), though they process this data a little bit differently. All TA indicators and oscillators are the same "eggs", but from the side. No matter how you shuffle the deck, the same cards are in it. All the turkeys and oscillators correlate too much with each other and correlate very poorly with the future.
In order to increase the generalization ability it is necessary to take some other data from somewhere that affects the quotes, but is not derived from the quotes. I.e. we need some additional information sources. And where to get them I cannot think of? Of course we can try to use the following predictors: moon phases, the number of sun-spots, results of street soccer team's games, water level in the river Wonchka or the number of fleas per square centimeter of Tuzika the mutt. But are they likely to be meaningful?
As for astrology, I wouldn't dismiss the practice of thousands of years. As a fan I can say that losing a favorite team negatively affects labor productivity. If Mukhosransk is a monocity with some kind of resource monopolist like Nornickel, then production may fall, as indirectly evidenced by the drop in the water level in the Vonyuchka River.
It is impossible to guess which butterfly, where and when, will cause a tsunami with a flap of its wing.
Maybe someone will be interested, I found a package that can simulate trading and build trading systems called quantstrat
