Article: Price forecasting with neural networks - page 13

 
Ulterior:

Dobrovo vam vremia sutok i izvinite za translit.

NO PRAISE!

 
Neutron:

OK, but I'd like a theory.

Here is the reasoning out loud. For a single layer NS with number of inputs N, we have N synapses whose weights uniquely determine the system N in the general case of nonlinear equations. It is clear that to solve such a system we need a training sample of N vectors, each consisting of N elements. In this case it does not work any other way. With two layer NS the number of inputs must be less than the total number of training vectors N by n, where n is the number of synapses in the second layer, so the length of training vectors has length N-n.

For a 3-layer NS, the order of reasoning is the same.

Thus:

1. We proceed from the depth of immersion we need and determine the dimensionality of the NS input.

2. Then considering the architecture (number of layers) of NS, we count the number of synapses, and get the optimal size of the training sample.


The reasoning is good, but it is empirical ... i.e. it is difficult to build an algorithm that depends only on the type and representation of the input data on them.

 
Sergey_Murzinov:

One of the most important things (in my opinion) is data preparation, to do this

1. Try to reduce the correlation of inputs. In other words, inputs should be as statistically independent as possible.

2. at normalization of input vectors, it is necessary to achieve increase of entropy - thereby increasing quantity of the information submitted to NS at preservation of the same volume of the input data.

A data quality check is obligatory, for instance, using the Kolmogorov-Smirnov method or/and with the Hurst exponent.


By selecting the network architecture we can reduce the error.

I disagree, any transformation of data is a change in the informativeness of the initial sample. Where is the criterion to evaluate the criterion for data change? I think everyone involved in NS knows about the old joke that happened during the next NS boom. When the U.S. military tried to use NS for ground-to-ground firing systems. One of the components was a pattern recognition network, which was very successful in learning etc., but for some reason it recognized tanks in rainy weather, and when it was presented with data from a different landscape structure, it made a bad mistake. The reason was that the network had learned to recognize a landscape well, but not a tank.


If we change the data, the problem may be the same - the network will select only the sequences (modes) that belong to the normalising series, because in principle we can regard the normalisation process itself as a component of the fluctuations of the time series.

 
I have now created a two-layer nonlinear NS and see that its predictive ability is slightly higher compared to the single-layer nonlinear one. But the scatter of predictions is noticeably smaller, which is nice. It may be also noted that the NS exhibits noticeable smoothing properties when the number of inputs is increased. In my case, however, the number of training vectors increases proportionally, perhaps the reason leads to an increase in the training sample. It is necessary to separate these effects somehow...
 

This shows the results of testing the predictive ability of the two NS.


The figure shows in red the original time series (RT), in blue the prediction at 1 bar forward of the linear single-layer network, and in green the nonlinear two-layer network. The depth of dip is the same in both cases. It can be seen that for this artificial case there is a marked residual of the predicted data on the BP trend plot. I wonder if my experienced colleagues observe this effect and if so, what could it be related to?

 

rip писал (а):


I disagree, any conversion of data is a change in the informativeness of the original sample. Where is the criterion to assess the criterion for data change? I think everyone involved in NS knows the old joke that happened during the next NS boom. When the U.S. military tried to use NS for ground-to-ground firing systems. One of the components was a pattern recognition network, which was very successful in learning etc., but for some reason it recognized tanks in rainy weather, and when it was presented with data from a different landscape structure, it made a bad mistake. The reason was that the network had learned to recognize a landscape well, but not a tank.


If we change the data, the problem may be the same - the network will only pick out sequences (modes) that belong to the normalising series, because in principle we can regard the normalisation process itself as a component of the time series fluctuations.

I am not said glad. This is the view that is held by the majority. I am glad.

 
Sergey_Murzinov:

rip wrote (a):


I disagree, any transformation of the data, is a change in the informativeness of the original sample. Where is the criterion to assess the criterion for data change? I think everyone involved in NS knows about an old joke that happened during the next NS boom. When the U.S. military tried to use NS for ground-to-ground firing systems. One of the components was a pattern recognition network, which was very successful in learning etc., but for some reason it recognized tanks in rainy weather, and when it was presented with data from a different landscape structure, it made a bad mistake. The reason was that the network had learned to recognize a landscape well, but not a tank.


If we change the data, the problem may be the same - the network will only pick out sequences (modes) that belong to the normalising series, because in principle we can regard the normalisation process itself as a component of the time series fluctuations.

I am not said glad. This is the view that is held by the majority. I am glad.

Opinion is an opinion, but in practice it's different. I've tried to use genetic algorithm to optimize network architecture - it gives interesting results, but it takes a very long time to calculate.

 

Here is another comparison of the predictive power of the network.


This is a step forward prediction work of a single-layer nonlinear NS (blue line) and a two-layer nonlinear NS. Here the single layer looks preferable.

It is interesting to compare these test results with those of other NS architectures. Can anyone post their results for comparison?

 

rip писал (а):


Opinion is opinion, but in practice it is different. I tried to use genetic algorithm to optimize network architecture, the result is interesting - but it takes a very long time to calculate.

If about 24 hours is a lot then yes. Just please note that when dealing with networks (operating them) it is done once.

 

to Neutron

No results so far, slowly getting the hang of it.

Regarding the first post (with the first picture) trying to understand what you mean. I personally did not see a noticeable lag. On the contrary, in areas where there is some kind of pattern it (the network) is pretty good at identifying it, and the error is just at the breaks in trends (and they are random here, or the sample size is not sufficient to establish some kind of pattern in their appearance).

Reason: