Market etiquette or good manners in a minefield - page 10

 

Hi paralocus, I think you can start right here and then we'll see.

By the way, to everyone who is interested: The strategy "lock in losses and let profits grow" or "lock in profits and let losses grow" (depending on whether the market is trending or flat, on the chosen trading horizon) is not optimal when reinvesting capital. In this case, it is more profitable to fix on each step with reinvestment! I.e. if we have 10 continuous profitable transactions, then it is more profitable to pay commission to brokerage companies and reinvest them, than to keep one position all the time and save on spread.

This is the paradox, from which it is not far to the bernullization of transactions, and after that - to the effective use of the basic equation of trading in the analytical form (unlike Vince) without any problems with parameterization.

 
Neutron >> :

Hi paralocus, I think you can start right there and then we'll see.


Thank you!

I have three perceptrons on the input (three so far) which is the first layer. The composition of the layer is as follows:

One perceptron on RSI, one on CCI and one on stochastic. All inputs and outputs are normalized (-1 ; +1). The perceptrons are trained by geneticist on the simplest scheme - on splitting.

Now I want to add a second layer of two perceptrons, one of which is trained to buy only and the other to sell only. Question:

Is it enough that the perceptrons of the second layer will be trained, so to speak, each in their own specialty, while using the same data from the first layer, or

The perceptrons of the first layer must also be trained separately for buying and separately for selling?

 
paralocus писал(а) >>

Thank you!

Not at all!

You see, paralocus, the NS is in essence a universal adder and mathematically it makes no difference whether you sharpen the neurons in the output layer individually for sell or buy, or build the third layer from a single neuron with hypertangent activation, whose output polarity will point to buy or sell, and the amplitude to the probability of success (certainty of the network in it). From this point of view, there is no need to specialize the first layer neurons forcibly - they will be determined in the process of training, and taking into account that the computing power of NS does not increase when moving from two-layer to three-layer architecture (except for some exotic cases) and the fact that the length of training sample is proportional to the square of all weights of the network (and sample should be minimal, if possible, for operational response to the Market events), and using Occam's razor principle (no need to get unnecessarily large).

P.S. Yes, it is important that the probability density of input signal for NS has zero expectation and distributed in the interval +/-1 equally (shelf). This markedly increases the learning and performance of the Network.

 
Neutron >> :

Not yet!

You see, paralocus, the NS is in essence a universal adder and mathematically it makes no difference whether you sharpen the neurons in the output layer individually for sell or buy, or create the third layer from a single neuron with hypertangent activation, whose output polarity will point to buy or sell, and the amplitude to the probability of success (the Network's confidence in it). From this point of view, there is no need to specialize the first layer neurons forcibly - they will be determined in the process of training, and taking into account that the computing power of the NS does not increase when passing from two-layer to three-layer architecture (except for some exotic cases) and the fact that the length of training sample is proportional to the square of all weights of the network (and the sample, if possible, should be minimal, for an immediate response to the Market events), as well as using the Occam's razor principle (no need to multiply unnecessarily).

P.S. Yes, it is important that the probability density of input signal for NS has zero expectation and distributed in the interval +/-1 equally (shelf). This markedly increases the learning and performance of the Network.

Eh! Some of it's clear, but a lot of it's new! So as not to miss anything, I'll ask as I go along...

1. NS is in fact a universal adder and it makes almost no difference from mathematical point of view, whether you sharpen the neurons in the output layer individually for sell or buy, or build the third layer from one single neuron with hypertangent activation, which output polarity-wise will point to buy or sell, and amplitude - to the probability of success (confidence of the network in it).

I understand about the adder, but what about neuron with hypertangent activation - what kind of a beast is it? I normalize inputs and outputs with sigmoid, and for making sure the signal on inputs and outputs is correct( -1<max and minima < +1), I've subtly rewritten Perseptron indicator. That is, we take three input neurons and give their outputs to the fourth one (with hypertangent activation) whose output can be absolutely transparent as a probabilistic estimate of a successful outcome of a trade in the specified direction (towards active polarity) ... right?


2. From this point of view, there is no need to forcibly specialize neurons of the first layer - they will be determined by themselves in the process of training

That is, neurons of the input layer should simply be trained to separate input data into "right" and "left" ones. I am a bit confused about how "they are determined in the process of training" - does it mean training of the output neuron hypertangent, or training of all of the output and input neurons at once? If all of them at once, the geneticist will not allow to optimize more than 8 parameters simultaneously, while there are at least 12 of them in such a grid (... not counting parameters of turkeys) - what should we do? If you want to train separately - firstly, each input parameter separately, and then only the output one (i'm doing it now), won't it be a mistake?


3. The fact that the length of a training sample is proportional to the square of all weights of the net (and the sample, if possible, must be minimal, to quickly react to the Market events).

How's that? Let's square all weights of the network, then sum up those squares and obtain what the length of the training sample should be proportional to?

About the uselessness of long samples I already know - I came, so to speak, by "scientific trial and error". I've even found the date (2008.12.02), starting from which the data is simply useless for the grid - no correlation with the actual market dynamics.


4. The best option seems to be a two-layer NS with one hidden layer and one output neuron.

Here I don't understand something... If there is an input layer of neurons, an output layer of neurons, and a hidden layer of neurons, then it is already three. So why is there a two-layer network?


5. Yes, it is important that the input probability density for the NS has zero expectation and is uniformly distributed in the +/-1 interval (shelf). This significantly increases the efficiency of the training and performance of the Network.

I myself understand the necessity of input signal normalization (at the level of intuition), that's why I transform the input signal so that the output would have the same form, but in the -/+1 range. But how is the probability density of a normalized RSI distributed? For example, the input signal of my RSI neuron looks like this:


Is this enough, or is something else required ?

P/S I'm fine with the razor, the main thing is to understand what to cut...:-)

 
paralocus писал(а) >>

1. I understand about the adder, about the hypertangent activation neuron - what kind of thing is that? I normalize inputs and outputs with sigmoid, and to make sure that inputs and outputs signal is correct( -1<max and minima < +1), I've slightly modified Perseptron indicator. That is, we take three input neurons and give their outputs to the fourth one (with hypertangent activation) whose output can be absolutely transparent as a probabilistic estimate of a successful outcome of a trade in the specified direction (towards active polarity) ... right?

It's a neuron, whose activation function (FA) is a hyperbolic tangent (range of values +/- 1) - it's convenient for making a buy/sell trading decision and if |FA|<const - out of the market.

All NS neurons must have non-linear FA (with some exceptions - except the last one). Nothing depends on the specific type of FA, except the rate of learning.

2. From this point of view, there is no need to specialize neurons of the first layer forcibly - they will be determined by themselves in the process of training.

That is, neurons of the input layer should simply be trained to separate input data into "right" and "left" ones. I am a bit confused about how "they are determined in the process of training" - does it mean training of the output neuron hypertangent, or training of all of the output and input neurons? If all of them at once, the geneticist will not allow to optimize more than 8 parameters simultaneously, while there are at least 12 of them in such a grid (... not counting parameters of turkeys) - what should we do? If I train separately - firstly, each input parameter separately, and then only the output one (which is what I am doing now), would it not be a mistake?

Of course, we should train all of them at once, otherwise we will have a problem with muddling through the wood. I haven't dealt with genetics, so I can't help.

3. The fact that the length of training sample is proportional to the square of all weights of the net (and the sample, if possible, should be minimal, for quick reaction to the Market events).

How's that? Squaring all the weights of the network, then adding those squares and obtaining the length of the training set?

There is an optimal length of training set P, which provides the minimum sum of approximation error and generalization error. This optimum is uniquely determined by the number of synapses w in the network and the dimension of input d (number of network inputs):

Popt=k*w*w/d, where k is a dimensionless constant of order 1 and takes into account the fact of market variability.

Criterion of optimum is commensurability of the network error on the test sample and on the training one, i.e. if normally trained net guesses 55% correctly, it will show approximately the same result in the test on the new data. Moreover, for such NS there is no problem of retraining connected with increase of number of iterations in ORO method - there is no local minimum of error - function is momentotonous and asymptotically tends to constant.

4. An optimal variant looks bilayer NS with one hidden layer and one output neuron.

Here I'm not following... If there is an input layer of neurons, an output layer of neurons, and a hidden layer of neurons, then it is already three. So why is the network two-layered?

It is a question of terminology. I do not single out the input layer as a special one. Therefore, I meant the NS which has only two layers - the input layer (aka hidden) and the output layer (consisting of a single neuron).

However, how is the probability density of normalized RSI distributed? Is this enough, or is something else required?

I don't know. You need to plot the probability density distribution of the first difference series of your RSI and look at the graph - there should be a shelf with maximum amplitude +/-1.

 
Neutron писал(а) >>

I am not singling out the input layer as a special one. Therefore, I meant the NS having only two layers - input layer (also hidden) and output layer (consisting of one neuron).

That is, the network consists of several parallel perceptrons in the first layer and one in the output layer, with inputs of the output perceptron equal to the number of perceptrons in the first layer?

 
Neutron >> :

Of course train everyone at once, otherwise we will have a problem - "in the woods and out of the woods". I haven't dealt with a geneticist, so I can't help you there.


There you go! And I was hoping to "improve" my mesh to a self-training one later...

Digesting the answers I got... >> I'll draw what I understand

 
FION писал(а) >>

So the network consists of several parallel perceptrons in the first layer and one in the output layer, with the number of inputs of the output perceptron equal to the number of perceptrons in the first layer?

That's right.

But, the input of each perceptron has a separate additional input for a constant +1 offset. This speeds up learning and increases the power of the Network.

 
Neutron писал(а) >>

That's right.

But, the input of each perseptron has a separate additional input for a constant +1 offset. This speeds up learning and increases the power of the Network.

I see. The constant offset just shifts the activation point on the hypertangent curve slightly .

 

Neutron wrote >>.


Here's what I figured out:

The hypertangent is sorted out. Until now I used sigmoid and had to subtract one from the result, but I don't need it with th. As far as I understood it, the figure shows the optimal architecture of NS for the market. The number of inputs is 12, and the number of synapses is 4. So, using the formula Popt=k*w*w/d we obtain 144/4 = 36... Is this 36 bars? Or 36 near Buy/Sell situations? Did I get it right?

Reason: