Neural network in the form of a script - page 9

 
Andy_Kon писал (а) >>

How important is accuracy for sigmoid?

After 20 the accuracy is in the 9th digit...


In training, it has an effect. Sometimes it can have a very serious effect on convergence, up to and including unlearning.

This should not be a problem when using GA for training.

 
YuraZ писал (а) >>

3-what to enter (it is clear how many inputs)

4-when to retrain

5-which learning algorithm to choose

GA - genetic

BP -


3. Really a problem, it all depends on imagination :) .

4. Here is where you can cheat. Make a network with the ability to forget. It's not that hard.

Let's suppose you could do the following (great for learning with BP):

Limit the number of patterns -- let's say 1000.

When next pattern appears (new TF, say)

- delete the oldest pattern

- Do a cycle of learning

- For a new pattern we should teach it 5 times.



What do you get?

+ Training cycle does not last long.

+ The old pattern is not forgotten right away.

- the new pattern is not learnt right away either, but

+ but faster than the old pattern is forgotten due to aggressive learning (5 times ONCE).

- only suitable for BP learning



5. it's a dilemma,

GA - steadily goes down to the optimum, has protection from local minima due to its properties, but it eats much memory and is wildly slow.

BP does not guarantee any result, though probability is very high (i.e. if GA is able to teach, BP will teach too), it is faster, does not eat memory, but needs features like Momentum (getting rid of local minima) and adaptive step (minima are skipped too, plus learning speed increases many times).



I'm for BP.

 
TheXpert писал (а) >>

3. Really a problem, it all depends on the imagination :) .

4. This is where you can cheat. Make a network with the ability to forget. It's not that hard.

Let's suppose you could do the following (great for learning with BP):

Limit the number of patterns -- let's say 1000.

When next pattern appears (new TF, say)

- delete the oldest pattern

- Do a cycle of learning

- For a new pattern we should teach it 5 times.



What do you get?

+ Training cycle does not last long.

+ The old pattern is not forgotten right away.

- the new pattern is not learnt right away either, but

+ but faster than the old pattern is forgotten due to aggressive learning (5 times ONCE).

- only suitable for BP learning



5. it's a dilemma,

GA - steadily goes down to the optimum, has protection from local minima due to its properties, but it eats much memory and is wildly slow.

BP does not guarantee any result, though probability is very high (i.e. if GA is able to teach, 95% of BP will also teach), it is faster, does not eat memory, but needs features like Momentum (getting rid of local minima) and adaptive step (minima are skipped too, and learning speed increases many times).



I'm for BP.


this is an example of GA work

source in SI...


GA is praised for its speed as fast

There is also MGA, which is even braver.


---

Well basically GA or MGA is a fast maximum or minimum search

at least METAQUOTES used GA for speed, not something else ...

Files:
dio.zip  4 kb
 
YuraZ писал (а) >>


in terms of speed, GA is lauded as fast.

There's also MGA, which is even livelier.


---

well basically GA or MGA is a fast maximum or minimum search

at least the METAQUOTES experts used GA for speed in their test and not something else ...


GA is definitely slower than BP.

Metaquotes applied it quite correctly as GA is a wildly versatile thing. And of course it will be faster than a simple overkill.

The question is the same as in neuronics -- it's not neuronics that recognizes letters, but specialized algorithms, FR networks don't use.

Similarly, it is better to use specialised algorithms for training, they are a priori better.

 
TheXpert писал (а) >>

GA is clearly slower than BP.

Metaquotes applied it quite correctly because GA is wildly versatile. And of course it will be faster than simple brute force.

The question is different, just like in neurons -- letters are not recognized by neurons, but by specialized algorithms, FR networks don't use.

Similarly, it is better to use specialized algorithms for training, they are a priori better.

And where is this information coming from, if it's not a secret? What about training FR with examples then?

And what is the difference (in principle) between a specialised algorithm and a trained neural network?

 
Sergey_Murzinov писал (а) >>

Where did this information come from, if it's not a secret? What about FR training with examples?

And what is the difference (in principle) between specialized algorithm and trained (trained) neural network?

Well first of all neural networks are not the best solution for character recognition. Although neocognitron achieved 99.7% with the dictionary, on characters with turns, really not the point.

Go to RSDN.ru and read threads related to neural networks. There are very smart guys there, by the way, I think you can find some of them here too :) .


About learning by example, like this:

- vectorize (make a skeleton)

- count intersections and their relative positions

- take, say, a couple of first Fourier transform coefficients, normalising beforehand for size insensitive. By the way, with FT one can get rotate insensitive, EMMNIP.

- average for examples

- put it in the base as a reference


How is it different?

When you use special algorithm, you know how it works.

You don't know how a neural network does it. You just know that this particular function defined by a neural network is able to interpolate inputs into outputs with a high degree of accuracy.

 

Yaaaaaaaaa!!!!!!!

By the way RSDN.RU is a forum for programmers, not developers of neural network algorithms and their applications.

I inform you that FineReader uses neural network technology blocks (latest versions). And I also advise to read special (not popular scientific) literature and read branches of specialized forums to get an idea about neural networks.

And a trained network is a special algorithm. The simplest example can be seen if you generate C code in the program NeuroShell2. The network code is perfectly visible.

 
Sergey_Murzinov писал (а) >>

Just so you know, FineReader uses neural network technology blocks (latest versions). And to get an idea of neural networks I advise to read special (not popular science) literature and read branches of specialized forums.

OK, may I have some links on the subject?
 

2 YuraZ

Right away I want to say thank you very much for your neural network code.

After a detailed parsing, I understood that you use neural offset method (arrays _threshold and _t_change) and also pulse method (Momentum parameter) to speed up calculations.

I have a few questions about its implementation.
1) You change _t_change in the weights correction function, but then don't use that correction anywhere to calculate the new _threshold array of weights.

2) When you pass the output to the sigmoid function, you subtract the _threshold parameter from the sum, although as written in the literature, this threshold weight is not -1 but +1. So you should add, not subtract. All the more so, when adjusting weights you feed exactly +1 and not -1.
Actually I've been messing around with this threshold and pulse, and it turns out that it really helps the speed of the calculations. The time is reduced by several times.

3) I was also interested in the sigmoid function. As I understand it, such its parameters are due to your practical experience in this area, but I think you've read Usserman's books, where he writes that the {0,1} range is not optimal. The value of the weight correction is proportional to the output level, and zero output level leads to the weight not changing. And since with binary input vectors half of the values will be zero on average, the weights they are associated with will not learn either!
The solution is to bring the inputs to the range {-0.5,0.5} and add an offset to the sigmoid also by 0.5. Such a sigmoid [1/(1+Exp(-x))-0.5], with a range {-0.5,0.5} reduces convergence time by 30-50%.
The only problem in this case is reducing the input vector to a range of {-0.5,0.5}. I'll probably have to normalize it. I tried to do it, but for some reason the sigmoid result was always positive. I would like to hear your opinion on this matter.

4) Now what about the way the inputs are set. Perhaps we should automate this process. What do you think of this approach: At the output we will expect the values 100, 010, 001, as usual.
To automatically set an array of outputs, I suggest defining the ratio of the maximum and minimum price at the next interval for each bar (for example, 5000 bars on one-minute bars). The value of this ratio will be an indication of where the price has moved to. If it is around 1, it is flat. If it is above 1 then it is up. If it is between 0 and 1, it is down. But I think the optimum range for analysis is not [0, +Є], but [-EЄ, -A; -A, A; A, +Є], which incidentally would correspond with our output vectors
. The network input will be the last K values of N moving averages (or the difference between MA and the average bar price). Thus, inputs will be N*K in total.

I am pasting a bit lighter and clearer base code (library of your functions). Generally speaking, at such an early stage of development it is probably not necessary to create a complete Expert Advisor product. Let's first implement a clear logic of calculations in the script. Let's test everything and then write a standard indicator on its basis. And then on this basis we will move further - backlinks and committees of networks, and probably a lot of interesting things.

P.S.
I hope you'll keep sharing your developments, so I have a small request. If you don't mind, please remove "Insert spaces" in editor options, it's a bit difficult to read code. Looking forward to new version. Suggestions for joint testing are possible.

Files:
 
sergeev писал (а) >>

2 YuraZ


2) When sending output to sigmoid function, you subtract _threshold parameter from sum, although as written in literature, this threshold weight is not -1, but +1. So you should add, not subtract. All the more so, when adjusting weights you feed exactly +1 and not -1.
Actually I've been messing around with this threshold and pulse, and it turns out that it really helps the speed of the calculations. The time is reduced by several times.

3) I was also interested in the sigmoid function. As I understand it, such its parameters are due to your practical experience in this area, but I think you've read Usserman's books, where he writes that the {0,1} range is not optimal. The value of the weight correction is proportional to the output level, and zero output level leads to the weight not changing. And since with binary input vectors half of the values will be zero on average, the weights they are associated with will not learn either!
The solution is to bring the inputs to the range {-0.5,0.5} and add an offset to the sigmoid also by 0.5. Such a sigmoid [1/(1+Exp(-x))-0.5], with a range {-0.5,0.5} reduces convergence time by 30-50%.
The only problem in this case is reducing the input vector to a range of {-0.5,0.5}. I'll probably have to normalize it. I tried to do it, but for some reason the sigmoid result was always positive. I'd like to hear your opinion on this matter.

2. The original formula looks like this -- S[j] = Sum(i)(y[i]*w[i,j] - t[j]). That is, the threshold is taken away, which is why it was called the threshold. And two minuses in the formula for recalculation result in plus, i.e. there's no error in using threshold and recalculation of parameters.

If I am wrong, the author can correct me.

3. This is not a sigmoid - it is a half-sigmoid. The convergence time for these functions depends on the input data to the network. It may happen that the bisigmoid will converge many times faster, but with different data you may get a completely different picture.

Reason: