Machine learning in trading: theory, models, practice and algo-trading - page 84

 
I keep telling them that predicting the market for 5 years is not realistic. Here at least for a week, another to build a model adequate and be happy about it, and they want to build a model as much as 5 years. This is utopia. And a question to all-knowing Alexey: Tell me, what will the model do if two absolutely identical classes are determined and the market correction will be diametrically opposite? So to speak contradictory data.... How would the model behave in this case???? After all, the pattern itself is not important, the important thing is the market reaction to this pattern....
 
Mihail Marchukajtes:
I keep telling them that predicting the market for 5 years is not realistic...

It's useless to talk and exhort. Many people have cognitive distortions among which there are also filters of information, such as: if information doesn't fit the world view, it is either not perceived at all or causes a spiteful response.

That is, most of the inhabitants here will either not pay attention to your conversations at all, or they will start grumbling and calling you a troll.

But that's not the point. The point is that in jPrediction since version 7 there is a possibility to assess the significance of predictors. To do this, after creating (training) a new model or loading a previously saved model from a file, you need to call the menu item "View a significant of predictors" or press the "hot" key F5:

And you can look at the table of predictor significance:

The best predictor is the best - the most significant predictor. If you remove the "Competitiveness" column from this sample, you get the message "Garbage in, Garbage out" after training.

The worst predictor is the worst - the least significant predictor. If you remove "Operating risk" column from this sample, the generalization ability will not get worse.

The remaining predictors marked as "-" in Description are of average significance. If you remove them from this sample, the generalization ability will noticeably deteriorate.

 
Yury Reshetov:

It is useless to talk and exhort. Many people have cognitive distortions, among which are also information filters, such as: if the information does not fit the worldview, it is either not perceived at all or causes an angry response.

That is, most of the inhabitants here will either not pay attention to your conversations at all, or they will start grumbling and calling you a troll.

But that's not the point. The point is that in jPrediction since version 7 the possibility of evaluating the significance of predictors has appeared. To do this, after creating (training) a new model or loading a previously saved model from a file, you need to call the menu item "View a significant of predictors" or press the "hot" key F5:

And you can look at the table of predictor significance:

The best predictor is the best - the most significant predictor. If you remove the "Competitiveness" column from this sample, you get the message "Garbage in, Garbage out" after training.

The worst predictor is the worst - the least significant predictor. If you remove the "Operating risk" column from this sample, the generalizability will not deteriorate.

The remaining predictors marked as "-" in Description are of average significance. If you remove them from this sample, the generalization ability will noticeably deteriorate.

Thank you!!! Extremely useful addition. Continuing to spin.... spinning...
 
Yury Reshetov:

It is useless to talk and exhort. Many people have cognitive distortions, among which are also information filters, such as: if the information does not fit the worldview, it is either not perceived at all or causes an angry response.

That is, most of the inhabitants here will either not pay attention to your conversations at all, or they will start grumbling and calling you a troll.

But that's not the point. The point is that in jPrediction since version 7 there is a possibility to assess the significance of predictors. To do this, after creating (training) a new model or loading a previously saved model from a file, you need to call the menu item "View a significant of predictors" or press the "hot" key F5:

And you can look at the table of predictor significance:

The best predictor is the best - the most significant predictor. If you remove the "Competitiveness" column from this sample, you get the message "Garbage in, Garbage out" after training.

The worst predictor is the worst - the least significant predictor. If you remove the "Operating risk" column from this sample, the generalizability will not deteriorate.

The remaining predictors marked as "-" in Description are of average significance. If you remove them from this sample, the generalization ability will noticeably deteriorate.

How do you calculate the significance of predictors?
 
SanSanych Fomenko:
How is the significance of predictors calculated?

In very short (but not very clear), predictor significance is calculated by weighting coefficients obtained after training.

You can see a more detailed algorithm for calculating predictor significance in the jPrediction source code. Well, or I will have to write a whole article to explain it more clearly.

 
Mihail Marchukajtes:
Thank you!!! Extremely useful addition. Continuing to twist.... to twist...

The main thing is that now you can very quickly calculate insignificant predictors and replace them with other predictors. After replacement it is necessary to watch: did the generalization ability increase or not? If it didn't increase, then the replacement was done incorrectly, i.e. a more significant predictor was replaced with a less significant one.

Yesterday I experimented with quotes. Quickly found the most significant TA oscillators. But there were only 5 of them. And further the generalizing ability does not grow, I don't care what you put in it. I.e. it turns out that whatever you do with TA indicators and oscillators, but in fact they all are based on the same data - a small segment of the previous history (several bars), though they process this data a little bit differently. All TA indicators and oscillators are the same "eggs", but from the side. No matter how you shuffle the deck, the same cards are in it. All the turkeys and oscillators correlate too much with each other and correlate very poorly with the future.

In order to increase the generalization ability it is necessary to take some other data from somewhere that affects the quotes, but is not derived from the quotes. I.e. we need some additional information sources. And where to get them I cannot think of? Of course we can try to use the following predictors: moon phases, the number of sun-spots, results of street soccer team's games, water level in the river Wonchka or the number of fleas per square centimeter of Tuzika the mutt. But are they likely to be meaningful?

 
Yury Reshetov:

The main thing is that now you can very quickly calculate insignificant predictors and replace them with other predictors. After replacing it, it is necessary to check whether the generalization ability has increased or not. If it has not increased, then the substitution was made incorrectly, the more significant predictor has been replaced by a less significant one.

Yesterday I experimented with quotes. I quickly found the most significant TA oscillators. But there were only 5 of them. And further the generalizing ability does not grow, I don't care what I use. I.e. it turns out that whatever you do with TA indicators and oscillators, but in fact they all are based on the same data - a small segment of the previous history (several bars), although they process this data a little bit differently. All TA indicators and oscillators are the same "eggs", but from the side. No matter how you shuffle the deck, the same cards are in it. All the turkeys and oscillators correlate too much with each other and correlate very poorly with the future.

In order to increase the generalization ability it is necessary to take some other data from somewhere that affects the quotes, but is not derived from the quotes. I.e. we need some additional information sources. And where can I get them? Of course we can try to use the following predictors: moon phases, the number of sun-spots, results of street soccer team's games, water level in the river Wonchka or the number of fleas per square centimeter of Tuzika the mutt. But they are unlikely to be significant?

As for astrology, I wouldn't dismiss the practice of thousands of years. As a fan I can say that losing a favorite team negatively affects labor productivity. If Mukhosransk is a monocity with some kind of resource monopolist like Nornickel, then production may fall, as indirectly evidenced by the drop in the water level in the Vonyuchka River.

It is impossible to guess which butterfly, where and when, will cause a tsunami with a flap of its wing.

 
Yury Reshetov:

The main thing is that now you can very quickly calculate insignificant predictors and replace them with other predictors. After replacement it is necessary to watch: did the generalization ability increase or not? If it did not increase, then the replacement was done incorrectly, i.e. the more significant predictor was replaced by a less significant one.

Yesterday I experimented with quotes. I quickly found the most significant TA oscillators. But there were only 5 of them. And further the generalizing ability does not grow, I don't care what you put in it. I.e. it turns out that whatever you do with TA indicators and oscillators, but in fact they all are based on the same data - a small segment of the previous history (several bars), although they process this data a little bit differently. All TA indicators and oscillators are the same "eggs", but from the side. No matter how you shuffle the deck, the same cards are in it. All the turkeys and oscillators correlate too much with each other and correlate very poorly with the future.

In order to increase the generalization ability it is necessary to take some other data from somewhere that affects the quotes, but is not derived from the quotes. I.e. we need some additional information sources. And where to get them I cannot think of? Of course we can try to use the following predictors: moon phases, the number of sun-spots, results of street soccer team's games, water level in the river Wonchka or the number of fleas per square centimeter of Tuzika the mutt. But they are unlikely to be significant?

Try cumulative delta. Cumulative distribution by real volumes..... Zscore system/ Ye cjjndtcndtyyj dc` 'nj c hfpys[ gfh? vj;yj lf;t 'rpjnbxtcrb[? nfv rjhhtkzwbz ljk;yf jncencndjdfnm ^-)
 
Mihail Marchukajtes:
Try cumulative delta. Cumulative distribution by real volumes..... Zscore system/ Ye cjjndtcndtyyj dc` 'nj c hfpys[ gfh? vj;yj lf;t 'rpjnbxtcrb[? nfv rjhhtkzwbz ljk;yf jncencndjdfnm ^-)
And of course data from other pairs, you can even exotic ones not related to the forecast...
 

Maybe someone will be interested, I found a package that can simulate trading and build trading systems called quantstrat

http://www.rinfinance.com/agenda/2013/workshop/Humme+Peterson.pdf

Reason: