Market etiquette or good manners in a minefield - page 42

 
Neutron >> :

Matcad allows you to visualise the counting process at any step. Experiment.

Can I have a word about that? I only have graphs when all the calculations are complete.

 

Me too :-(

What I meant was that you can display any information you are interested in and present it in a suitable form for later analysis.

 
You wrote somewhere that throughout the training vector the network learns to recognize only the n+1th reference. Should this be understood to mean that for N epochs the position of the training datum doesn't move? Actually I did so (in that listing where there were a bunch of different functions), that is, throughout the vector error was calculated as TEST-OUT, where TEST was the n+1th sample. Now, in this implementation, TEST is the count that follows the last (largest) input. I.e. it is shifted throughout the whole sector from d+1th to n+1th. Could there be an error here?
 

Wait a minute.

We supply a data vector of length d to the input of the NS, where d is the number of information inputs of the grid (not counting the offset). At the OUTPUT of the grid, to train it, we feed d+1 counts. Weights are random at this step. At this step we have obtained a delta for correction of each vector. Remember it (do not correct anything). Move by one measure and repeat the procedure. Add the correction and its squares separately. Repeat the procedure three times (without correction of weights). Finally, the first time we correct all weights, this is the FIRST epoch of learning. Now, start all over again with the same data, but start with the weights already found. At the end correct weights, this is the SECOND epoch of learning. Do it 100 times (for example), which makes 100 epochs of training. That is all. The net is ready to make a prediction. The entire data vector with the most recent last count is fed to it and we receive the forecast. After the real (not predicted) counting comes, retrain the network again, with randomization of weights.

 
Then there is no error.
 

Interesting discussion:) Neutron, by the way, you still haven't answered my question about the starting weights after all? You only talked about how you retrain the net. But even without that, if you retrain the net at least once, it will have an error and even a significant one. I'm talking about my experience with backprop:). Actually I'm only interested in this question, everything else in the grid technique is not important. Committees of nets, how do you look at it, and is it needed at all, maybe it's possible to adjust weights initially, so that the grid was near global minimum at the start of training, so that the committees just weren't needed?

 
paralocus писал(а) >>
Then there is no error.

As I move from epoch to epoch, I push up each weight with a wo such a "push-up":

This prevents the weights from creeping into the saturation area and keeps them in the +/-5 range during training.

Registred wrote >>

Neutron, by the way, you still haven't answered my question about the initial weights? You only talked about how you retrain the net. But even without that, if you retrain the net at least once, it will have an error and even a significant one. I'm talking about my experience with backprop:). In fact I'm only interested in this question, everything else in the grid technique is not important. Committees of nets, how do you look at it, and is it needed at all, maybe you can somehow tweak weights initially, so that the net is near global minimum at the start of training, so that committees just wouldn't be needed?

I randomize the initial weights with a random value distributed over a shelf in the range +/-1. I do this on each count. The grid retraining at each step on average finds exactly the global minimum and this is an advantage of retraining at each step, compared to a once and for all trained grid. It may accidentally hit a local hole there and all its predictions will be inadequate. In this case it's really important to look for ways to optimise the starting point for the scales. I have not solved this problem.

As for committing nets, that's a useful thing, but resource intensive. We can show that simply increasing the number of neurons in a hidden layer is in essence a commit, but more resource-intensive than the classic version, but more powerful because of integrated nonlinearity of commit members. This is where you should experiment.

 
Neutron >> :

As I move from epoch to epoch, I push up each weight with a wo such a "push-up":

This prevents the weights from sprawling into the saturation area and keeps them in the +/-5 range while learning.

I simply compress the interval when normalizing not from [-1;1], but from [-0.9;0.9], the effect is the same, no saturation. I mean a little differently, the initial weighting coefficients in the process of weight adjustment may simply not reach the opimal values due to the so-called gullibility of the function. I am struggling with this, to be honest. And you probably haven't solved this problem either, so it's hard to get anything worthwhile out of the market with a backprop, even if the evidence base is good for modelling.

 
Neutron >> :

I randomise the initial weights with a random value distributed over a shelf in the range +/-1. I do this at each countdown. The grid, retraining at each step, on average finds exactly the global minimum and this is an advantage of retraining at each step, compared to the once and for all trained grid. It may accidentally hit a local hole there and all its predictions will be inadequate. In this case it's really important to look for ways to optimise the starting point for the scales. I have not solved this problem.

As for committing nets, that's a useful thing, but resource intensive. We can show that simply increasing the number of neurons in a hidden layer is in essence a commit, but more resource-intensive than the classic version, but more powerful because of integrated nonlinearity of commit members. You need to experiment here.

This is the thing that spoils the whole thing:). By the way, as the result of my observations, the best randomization of weights which quickly learns mesh is in the interval [-0.07; +0.07]. I don't know why this is the case:)

 
registred писал(а) >>

I simply compress the interval when normalizing not from [-1;1], but from [-0.9;0.9], the effect is the same, there is no saturation. I mean a little differently, the initial weighting coefficients in the process of adjusting weights may simply not reach the opimal values due to the so-called gullibility of the function. I am struggling with this, to be honest. And you probably have not solved this problem either, that is why it is difficult to get something worthwhile in the market with backprop, even if the proof base is good for modeling.

God will take care of the right guy! The backproping procedure is not complicated, and NS learning does not suffer - it is an effective method.

As for not achieving optimum values, that's pure bluff for our BPs. I understand if you're predicting a sine wave! - There, yes - there are optimal values. But what are those in market choppiness? Now the optimum is there, and in the next step (which you are predicting), it is there... and you have been looking for it "here" with all your might. In short, there is no precise localisation problem, and it is solved satisfactorily by retraining at each step.

Reason: