Market etiquette or good manners in a minefield - page 25

 
registred писал(а) >>

Gentlemen, can you tell me how you deal with flying into shallow local lows and curved initial weights? I understand that in the beginning they have no effect on the training, but later they start to affect the results very much.

I have made it a rule to retrain the network at each step. Obviously, in this setup, the network may occasionally be "out of place", but in the next step, it is right where it needs to be. My idea is based on the fact that the probability of the network learning the "wrong" thing is noticeably less than 1, and in general, on a large sample of retraining, the contribution of "crazy kids" is minimal.

paralocus wrote >>.

I'm a little confused about how to reduce the contribution multiplier of another epoch.... I've got very small output layer weights by the end of training, and large hidden layer weights on the contrary.

Alert: W2 [0] = -0.0414 W2 [1] = 0.0188 W2 [2] = -0.0539

Alert: W1[1,0]=-27.0731 W1[1,1]=-30.2069 W1[1,2]=37.6292 W1[1,3]=30.4359 W1[1,4]=-22.7556 W1[1,5]=-37.5899

Here you will have to think for yourself. I will only predict the signs of the expected price increases (+/-1). It has to do with specificity of trading (see The Basic Equation of Trading a few posts above), and the fact that trying to predict the amplitude and the sign simultaneously makes it catastrophically difficult (the number of TC architecture and training epochs is growing). No home PC power would be enough here, even if we don't retrain the Grid at every step! So, traditionally, when predicting absolute BP values, control the learning error at each epoch until it becomes less than some. This process may not converge - the Grid hangs in an infinite loop and mechanisms are needed to bring it out of its comatose state. When I experimented with this, I controlled the rate of reduction of learning error and when the condition was met - redormed all weights, i.e. practically started learning all over again. At that I had to find out approximate number of epochs needed for learning and weight of each epoch (coefficient before correction of each weight) decreased by law 1-j/N. Where j runs in values from 1 to N. After I abandoned the predictive amplitude, the network began to learn quickly and efficiently, so it became possible to introduce a fixed number of epochs of training without monitoring the learning error.

Also, going from prediction to prediction, to reduce the number of epochs, I kept the values of weights of the Network without randomizing them. Sometimes I had an effect of "sticking" of some weights that manifested itself in their infinite increase or tendency to zero. I've coped with it this way: when making a new forecast I have influenced all weights with th() operator. It worked efficiently.

 
Neutron >> :
... A fixed number of training epochs without monitoring the learning error.

>> the issue has been resolved!

 

to Neutron

I'm in the process of rewriting all my two layers into a more compact form. I want to reduce it all to matrix operations of two or three functions. As soon as I finish it I'll post it.

At the same time I'll "cut out" the amplitude prediction. In fact - the sign is more than enough.

 
Neutron >> :

This is where you have to think for yourself.

You mean think about how to go from calculating amplitude error to calculating sign error?

You mean here?

d_2_out = test - out;                                             // Ошибка на выходе сетки
 

No. I was talking in general terms. Obviously, you'll go your own way...

 

There are a couple of questions that, to avoid unnecessary agitation here, I'd like to ask via private line.

I know you're not an amateur...

May I?

 
You ask!
 
Neutron >> :
You ask!

>> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.)

 
Answered.
 
Neutron >> :

I have made it a rule to retrain the network at each step. Obviously, in this setup, the network may sometimes be "in the wrong place", but in the next step it is already in the right place. The idea is based on the fact that probability of the network being trained "wrong" is much less than 1, so on the whole, on a large sample of retraining the contribution of "crazy kids" is minimal.

I'm not quite sure what you mean by "retraining at every step"?

Reason: