Market etiquette or good manners in a minefield - page 58

 
gpwr >> :

I think you misunderstood me again.

I really don't understand how you are going to influence the coefficients of this polynomial in order to find the global minimum error (i.e. learning). I'll show you the weights of the trained neuron:


Three experiments on the same vector. The rightmost experience was the most successful. That is, I understand that having a ready topology, it is not difficult (theoretically) to pick such a polynomial that would smooth this topology quite nicely, but explain me, how are you going to calculate this topology (already trained mesh) for an untrained one? In other words - what is the algorithm for influencing the kf, leading to a reduction in the learning error function? Do you know it?

 
Neutron >> :

Built it especially for you:

You can clearly see that FZ is always present and is visually visible on the sharp movements of the kotir.

Sergey, I will not discuss this topic with you anymore because it is trivial and completely useless. Learn the math, and the next time you come up with the next super-duper brilliant idea, for the implementation of which you think you need one or two research institutes and a cluster of PC, you think for a minute - maybe you just do not know or do not understand. After all, it is more likely than the "epochal discovery", in the area where you have long trampled everything before.


OK, consider that phase delay (that's the term) exists for two cases



Frankly, and you bore me :o)

 
HideYourRichess >> :

I am surprised to see an algorithm that demonstrates this very 80%. I am looking for an error. It looks very simple. It doesn't work like that.

no surprise if I, not a mathematician got it right, so what's to say about a pro! :о)))

 

to Neutron


Is there a phase delay between High/Low and Close? :о))) So, according to your visual method, there is one:




Where can it come from?


Corrections and addendum: As long as no one is watching, I will make small corrections. I made a little mistake in haste, in the - picture above Open and Close. One signal is delayed relative to the other, but in this particular case it is not a phase shift.


There is no phase delay. No mathematical operator was executed that would cause a shift. There is no phase shift that appears out of the blue. Instead, there's a process choice, a rule that says "this is the process".


If "shift" is considered in terms of Open going first and Close going second, then yes - there is a "shift" (I'm not going to argue against that). But I don't even know what math methods will "find" the shift in this particular case. These signals are alternative to each other.




And to choose Close for prediction, it is necessary to have an unbelievably accurate system. And for my simple idea, which by the way is very "robotic", no delay (H+L)/2 will have any effect at all.


PS: Jesus, Seryoga Seryoga, - these processes are monopenic, absolutely. That's it now, goodbye. Good luck

 

to Neutron

While I'm waiting for the new Matkad, I'm repeating what I've learned, i.e. I'm messing around with the single layer. You asked me to show the error vector length, and that's what I got:


X statistics, L length (if I got it right).

Calculated this way:


Here i is the loop on the statistic X is the input vector(summed over the entire length of the current training vector). Error square is accumulated during the whole epoch together with the square of the training vector:


And at the end of an epoch, it is counted as follows:


Here n is the cycle over epochs.

Is everything done correctly?

 

Judging from the figure, there is an error somewhere - we should see gradual learning of the network (decreasing error vector length) as we move from epoch to epoch. This is not visible. The reasons, as always, could be a wagon and a small cart. For instance, instead of error vector magnitude from epoch, the graph shows that magnitude for the already trained network (last epoch) as a function of independent experiment number... It follows from your - "By X statistics..." - what statistics? It's not like we're supposed to type it in here. And this one - "...by L length", - L is normalized to the length of data vector and must lie near 1, gradually decreasing towards the end of the circumference... We see something different.

Here, take a look at how it should look like:

Here, the blue one shows the length of error vector on training sample (we're looking at how the grid is trained, not how it predicts). All in all we used 200 training epochs and k=1 for clarity, to show that in this particular case net is fully trained (error is zero) and simply learns the training sample by heart. It's even faster. The trouble is that on the test sample with such weights our adder will show the weather in Africa, i.e. it is completely deprived of the generalization ability. The red lines in the figure show the variance (dispersion) of a series of experiments (n=50), while the blue line shows the average (I do compile statistics, but in a different way than you do and I will talk about it later).

Your last two expressions are almost correct, except that there shouldn't be an index on statistics (you do only ONE experiment and you need a new code, without a set of statistics), and I don't understand the first equation. Where is it coming from? I have a similar block that looks like this:

Where j, is the loop over the training vector. Notice that my indexes are noticeably smaller when the index is squared!

P.S. By the way, I gave up using the squeeze function for the weights, first for the single-layer and then for the double-layer. Without it, the results are just as good and the hassle is less.

 
grasn >> :

it's no surprise if I, not a mathematician got it, so talk about a pro! :о)))


Figured it out. What I was doing could be considered a primitive version of AR, or vice versa, AR could be considered an improved version of what I was doing.

 
Neutron >> :

Your last two expressions are almost correct, except that there shouldn't be an index on the statistics (you're only doing ONE experiment and you need new code, without a set of statistics), but I don't understand the first equation. Where is it coming from? I have a similar block that looks like this:

Where j, is the loop over the training vector. Notice that my indexes are noticeably smaller when the index is squared!

P.S. By the way, I gave up using the squeeze function for the weights, first for the single-layer and then for the double-layer. Without it, the results are just as good and the hassle is less.

The first equation is calculating error vector length and normalizing it to data vector length (i.e. the way I understand it so far) The reason is probably that I really need new code without statistics set. I will do it now.

As for the compressing function, it didn't work for me straight away (i.e. the result wasn't obvious), so I didn't use it.

 
paralocus писал(а) >>

The first equation is to calculate the length of the error vector and normalise it by the length of the data vector (i.e. the way I understand it so far)

What then do the last two expressions represent?

I thought the second one was finding the squares of vector lengths and the third one was finding the normalised length. If so, why the first expression?

 
HideYourRichess >> :

I've got it figured out. What I did can be considered a primitive version of AR, or vice versa, AR can be considered an improved version of what I did.

I didn't include model identification, i.e. optimal definition of sample length and model order. With those, I think it's possible to get to 90%. I have no doubt at all that your results will be just as good, and even better. ;)

Reason: