How do we choose the best activation function in neural networks? - Trading Systems

Chris70 2019.11.30 12:29 #271

In the meantime I will share some new insights about neural networks, supported by a little experiment.

I talked about activation functions every now and then. To recap: they are the functions that define how much a neuron "fires" relative to the sum of it's weighted inputs. This is why they are not just some small hyperparameter detail, but at the core of any neural notworks means of operation.

In general, pretty much any activation function can be used to train a neural network, usually not influencing the final results of the trained network by much. But then: how do we chose the best activation function while there are so confusingly many of them out there? Many roads lead to Rome, but some are taking unnecessary detours.

For the output neurons, the possible answers usually are already given to some extent by the task that the network shall solve. For example for classifiers, usually softmax is chosen or if we expect output values between -1 and +1 the "tanh" function fits this range, or for 0-1 the sigmoid function or for up to +/- simple the "identy" function f(x)=x.

But what about the hidden neurons, i.e. all those neurons in the middle layers, between input and output layer?

In general, there are some qualities that we expect from a good activation function:

- monotonous: i.e. steadily going up or down (the sine wave function with all its hills and valleys would be one example that doesn't fit this criterion)

- differentiable: i.e. not "flat" segments with no "slope" (the ReLU function is an example that is not differentiable for in the negative range)

- no saturation: i.e. not asymptotically approaching a limit (like the sigmoid function that is cast between 0 and 1 but never exactly reaches 0 or 1, not even in infinity); again, this can be a problem with differentiation during backpropagation, because the gradient can be extremely low (<DBL_MIN), which causes "not a number" (NaN) errors and reaches a point where training becomes impossible.

- finite range: this is the opposite of "no saturation", but can cause problems, too --> instead of becoming extremely low, sometimes numbers can explode to infinity, especially if combined with a high learning rate

- going through zero, i.e. f(0)=0

- non-linearity

- near-linear behavior in the area around zero

There is no activation function that fits all criteria! We always need to make some kind of compromise.

What I was especially interested in now is the question: which function is the most effective for training? Meaning: How can we most effectively minimize the error function (= loss function, = cost function) in terms of error reduction per epoch?

In order to answer that, I scheduled trainings with different hidden neuron activation functions while leaving all other parameters the same. The network predicts the direction of the next range bar like I described a few posts above.

Model: multilayer perceptron, input layer with 760 neurons, 4 hidden layers, 1 softmax output layer.

I ran each training over 5 epochs and compared how much the cost function went down within that time, in order to get an effectiveness ranking. Some activation functions are standard, some are my custom adaptations. I think this comparison can be quite helpful for other people working with neural networks or google searching this topic in the future.

Here is the ranking (most effective ones on top):

activation function	formula / notes / description
log rectifier (custom)	0 for x<=0, ln(x) for x>0
tanh rectifier (custom)	0 for x<0, tanh(x) for x>=0
leaky log rectifier (custom)	0.01x for x<=0, ln(x) for x>0
ReLU (rectified linear unit)	0 for x<0, f(x)=x for x>=0
ELU (exponential linear unit)	0.01*exp(x)-1 for x<0, f(x)=x for x>=0
softmax (normalized exponential)	result depends also on the other neurons within the same layer! all results in this layer altogether add up to 1.0 ("100%") --> can be interpreted as relative probabilities
semi-differentiable hardstep (custom)	0 for x<0, 1+0.01x for x>0
ISRLU (inverse square root linear unit)	f(x)=x for x<0, x/sqrt(1+alpha*pow(x,2)) for x>=0, with alpha=1.0
ramp	0 for x<-1, f(x)=x for x betw. -1 to 1, 0 for x>1
bent identity	(sqrt(pow(x,2)+1)-1)/2+x
identy function	f(x)=x --> can't solve non-linear problems! worse final result at convergence to be expected!
area sin. hyperbolicus (=inv.hyperbol.sine)	arsinh(x)
log symmetric (custom)	-ln(1-x) for x<0, 0 for x=0, ln(x+1) for x>0
arcus tangent (arctan)	atan(x)
sinusoid	sin(x)
cardinal sine (sinc)	0 for x=0, sin(x)/x for x!=0
oblique tanh (custom)	tanh(x)+0.01x
oblique sigmoid (custom)	1/(1+exp(-x))+0.01x
sigmoid ("logistic")	1/(1+exp(-x))
hyperbolic tangent (tanh)	tanh(x)
inverse square root unit (ISRU)	x/sqrt(1+alpha*pow(x,2)) , with alpha=1.0
gaussian	exp(-pow(x,2))
softplus	ln(1+exp(x))
softsign (Elliot)	1/(1+abs(x))
leaky differentiable hardstep (custom)	0.01x for x<=0, 1+0.01x for x>0

What can we learn from all that: the rectifier types (including some custom versions) are the most effective for getting the networks error down. The "traditional" tanh and sigmoid are to be found only in the lower mid field. This doesn't come as a suprise; the intriguingly simple ReLU function is nowadays used much more than in the early days of neural networks when people thought activation functions needed a fancy shape for good results. On the plus-side: there is much less to calculate, which helps with overall performance.

I hope this was interesting for some of you guys,

Cheers,

Chris.

Activation functions Machine Learning and Neural Weight initialization methods in

Icham Aidibe 2019.11.30 12:32 #272

Chris70:

Data science wasn't invented for being convenient ;-)

Technology's reason to be is convenience - since ever & forever

Young Ho Seo:

@Chris70

I really like your comment.

I am also data scientiest of some sort as well as financial trader, I guess :).

Here is my table of outlining five important price patterns in the financial market.

From the table, you will see the price pattern from simple (left) to complex (right).

Also the complexity of these price patterns are roughly matching with number of cycle period we can describe in Fourier transformation or some sort of cycle study (You can see this in the top of the price pattern table).

Importantly, many traders and investors are trying to capture these patterns to make money.

With many math model and the trading strategy, the problem comes to model the fifth regularity, the fractal wave because they posses the infinite number of cycle (i.e. repeating patterns).

In this end, I will elaborate just short comment here.

The value of X3 pattern is not just data reduction side.

X3 Pattern framework is one of few scale indepenent tool, including Wavelet transformation, which means you can properly model the repeating patterns inside price series using X3 pattern framework.

Many other scale dependent tool will break down with forex and stock market data in this end.

There are only few scale indepent tool out there. X3 pattern is one of them.

In this ends, wisdom of financial trader goes back nearly 100 years ago. Even before the personal computer become popular, they including Elliott Ralph Nelson and H.M. Gartley and other Fibonacci trader knew how to deal with the market.

If you are doubt about my comment, I will recommend to calculate fractal dimention of Forex and stock market.

How precisely the fractal dimesion describe the market and why you need scale indepdent tool for your trading.

You will find a lot of meaningful information from there. But it could be just starting point, I guess.

https://en.wikipedia.org/wiki/Fractal_dimension

@Young Ho Seo Most of us see in these patterns only pictures :-)

Chris70 2019.11.30 13:20 #273

@Young Ho Seo

With many math model and the trading strategy, the problem comes to model the fifth regularity, the fractal wave because they posses the infinite number of cycle (i.e. repeating patterns).

The "fifth regularity" is a nice theory that is proven empirically wrong.

In this ends, wisdom of financial trader goes back nearly 100 years ago. Even before the personal computer become popular, they including Elliott Ralph Nelson and H.M. Gartley and other Fibonacci trader knew how to deal with the market.

Fibonacci was right about his numbers in nature and math. Abusing these findings for trading is wrong.

Ellliot and Gartley: again, wrong.

How can I be so impertinent to say all this? Again, look at empirical distribution curves of retracements and extrapolations and check if there is a higher probability for the price to turn at these numbers or at least within an area around these numbers.

Simple result: there aren't! No spikes, no humps, just a smooth curve! This makes it fact instead of a matter of belief. Fibonacci is a myth in trading. Period. Don't hijack a proven mathematical concept for something that it wasn't made for. And Elliott and Gartley are elaborations founding on Fibonacci and are just as wrong. Any a nice theory is useless if it fails with real world statistics. I'm sure this is hard to accept for somebody who has worked with these concepts a lot.

It's like with religion or art: it's nice what we believe in or what is considered beautiful, but all that doesn't help with trading if real world numbers don't care. They don't.

I'm not complaining about missing theory, I'm missing the prove.

X3 Pattern framework is one of few scale indepenent tool, including Wavelet transformation, which means you can properly model the repeating patterns inside price series using X3 pattern framework.

Fractals, wavelets, Fourier... all these methods that can be applied to time series are basically filtering methods. Simplification and/or noise reduction is what they are made for. Getting rid of a certain amount of the original information is part of their purpose. That's all okay, if this is what we want. Which leads to:

The value of X3 pattern is not just data reduction side.

Okay. As I said, I see much value in your pattern notation method for the manual trader, because he/she can get some systematic approach out of and endless series of price/volume ticks.

But how does this help with machine learning? It's all okay if you just use it as what it is: a notation system. There is nothing bad about using it or even feeding this information in prediction models. It's like with using indicators - it's all okay as long as we are aware of their vast limitations.

Machine Learning already has to offer many better methods for automated pattern recognition and clustering (https://en.wikipedia.org/wiki/Pattern_recognition).

What I'd be haveing a problem with would be the assumption that any arbitrarily imposed pattern definition could be superior to an automated approach like with CNN feature maps. I guess it can be boiled down to " defining a model versus (auto-)searching for a model".

The riddle: the distribution Machine learning in trading: I'm getting a 80-85%

Young Ho Seo 2019.11.30 13:23 #274

Icham Aidibe:

Technology's reason to be is convenience - since ever & forever

@Young Ho Seo Most of us see in these patterns only pictures :-)

Your concern is perfectly understanable. :)

For price pattern trading come this far, there were many hicups last serveral decades.

Firstly of all, any automated price pattern detection algorithm were suffered visually from what trader called repainting issues.

Presenting repainting indicator to scientific minded people are disadvantegous even though many financial trader uses the repainting indicator for their trading.

Hence, I took few years of efforts to break the repainting barrier and to present more scientific evidenance to people with these price patterns.

Now this vidoe is out dated since we have moved into more advanced algorithm but good enough to show that price patterns like Harmonic Pattern, Elliott Wave pattern and X3 pattern are measurable.

https://www.youtube.com/watch?v=7gwph8BadFI

What I did so far is to connect these price patterns into science.

So we can explain more about these patterns and we can get benefit more from them.

To many people, Elliott Wave and Harmonic pattern and X3 patterns are just trading tools but not science. It is very strange but with understanable reason, they are hard to visualize their outcome.

But these patterns are science with measurable outcome too.

So far my journey up to here took me nearly 10 years of ground breaking works because there was no guide or previous work on what I am doing.

Hopefully next one would get benefits of my previous research.

I really appreciate the bright people in this forum.

Kindest regards.

X3 Pattern and Harmonic Pattern Video

2018.03.05
www.youtube.com

This video shows the testing of only one X3 Gartley Pattern for the clarification. However, in trading, you will make use of over 8 different patterns includ...

Elliot Wave Counting Using Alert for Fun Is there any non-repainting

Chris70 2019.11.30 13:32 #275

Young Ho Seo:

To many people, Elliott Wave and Harmonic pattern and X3 patterns are just trading tools but not science. It is very strange but with understanable reason, they are hard to visualize their outcome.

People like me ;-)

"Science" for most people usually implies: hypothesis --> method definition --> experiment --> results --> acceptance or rejection of the hypothesis --> conclusion

Gartley, Elliott and Fibonacci methods (in trading) don't go beyond the point of the method, because any further attempts failed.

Sorry, that's no science.

MQL5 training Dependency statistics in quotes Econometrics: one step ahead

Alain Verleyen 2019.11.30 14:00 #276

Chris70:

People like me ;-)

"Science" for most people usually implies: hypothesis --> method definition --> experiment --> results --> acceptance or rejection of the hypothesis --> conclusion

Gartley, Elliot and Fibonacci methods (in trading) don't go beyond the point of the method, because any further attempts failed.

Sorry, that's no science.

Most people on this forum confused science and marketing.

Young Ho Seo 2019.11.30 14:12 #277

Chris70:

@Young Ho Seo

With many math model and the trading strategy, the problem comes to model the fifth regularity, the fractal wave because they posses the infinite number of cycle (i.e. repeating patterns).

The "fifth regularity" is a nice theory that is proven empirically wrong.

In this ends, wisdom of financial trader goes back nearly 100 years ago. Even before the personal computer become popular, they including Elliott Ralph Nelson and H.M. Gartley and other Fibonacci trader knew how to deal with the market.

Fibonacci was right about his numbers in nature and math. Abusing these findings for trading is wrong.

Ellliot and Gartley: again, wrong.

How can I be so impertinent to say all this? Again, look at empirical distribution curves of retracements and extrapolations and check if there is a higher probability for the price to turn at these numbers or at least within an area around these numbers.

Simple result: there aren't! No spikes, no humps, just a smooth curve! This makes it fact instead of a matter of belief. Fibonacci is a myth in trading. Period. Don't hijack a proven mathematical concept for something that it wasn't made for. And Elliot and Gartley are elaborations founding on Fibonacci and are just as wrong. Any a nice theory is useless if it fails with real world statistics. I'm sure this is hard to accept for somebody who has worked with these concepts a lot.

It's like with religion or art: it's nice what we believe in or what is considered beautiful, but all that doesn't help with trading if real world numbers don't care. They don't.

I'm not complaining about missing theory, I'm missing the prove.

X3 Pattern framework is one of few scale indepenent tool, including Wavelet transformation, which means you can properly model the repeating patterns inside price series using X3 pattern framework.

Fractals, wavelets, Fourier... all these methods that can be applied to time series are basically filtering methods. Simplification and/or noise reduction is what they are made for. Getting rid of a certain amount of the original information is part of their purpose. That's all okay, if this is what we want. Which leads to:

The value of X3 pattern is not just data reduction side.

Okay. As I said, I see much value in your pattern notation method for the manual trader, because he/she can get some systematic approach out of and endless series of price/volume ticks.

But how does this help with machine learning? It's all okay if you just use it as what it is: a notation system. There is nothing bad about using it or even feeding this information in prediction models. It's like with using indicators - it's all okay as long as we are aware of their vast limitations.

Machine Learning already has to offer many better methods for automated pattern recognition and clustering (https://en.wikipedia.org/wiki/Pattern_recognition).

What I'd be haveing a problem with would be the assumption that any arbitrarily imposed pattern definition could be superior to an automated approach like with CNN feature maps. I guess it can be boiled down to " defining a model versus (auto-)searching for a model".

I am not worshiping Elliott and Gartley but I worked to connect their price patterns into the Fractal Science estabilished by Benoit Mandelbrot.

It might be little impulsive to disregards the work of Benoit Mandelbrot, I guess.

https://en.wikipedia.org/wiki/Benoit_Mandelbrot

Everything has start and end. Each phenomanan we found in our real life can be connected to the particular piece of sceience.

This part of price pattern theory is not fully expanded by many scienstist yet.

But this does not mean that we can not connect them into science.

When you talk about Fibonacci ratios, why do you look at distribtuion of price series ?

What relatinship the distribution of price series has with Fibonaaci ratio ?

Fibonacci ratio is not about the data dirtribution but they are the ratio of two price swings in the price series.

These two price series are forming one fractal triangle as below.

To simply get the evidence, all you have to do is counting their occurance of the ratios (i.e. count each fractal triangle).

For example, this is what I did.

The left number in the box describes the Fibonacci ratio and the right number in the box describe their occurance.

For example, 0.55, 0.313 in EURUSD daily timeframe shows how often the ratio 0.55 repeatedly found in entire fractal triangle patterns. In this case, it is 31.3%.

Likewise, 0.5, 0.333 in GBPUSD daily timeframe shows how often the ratio 0.5 repeating found in entire fractal triangle patterns. In this case, it is 33.3%.

However, this was measured around the number 0.55 ratio not on exact number 0.55.

Hence, we can only caculate the tendency but not the exact number.

For example, the ratio 0.5 is higher than other ratios. Like this you can see the humps around the particular ratio.

Some of them are very close to the fibonacci ratio or The ratio we use to detect Price Patterns Like Harmonic Pattern, Elliott Wave pattern, etc.

Like this, we can generate accumulated evidnece towards the price patterns.

As I said, there is start and end. We still need much more work utilizing the fractal science.

Benoit Mandelbrot did great work on fractal science but his fractal work around Financial market started little late.

It was not fully expanded yet. Unfortunatley he is not around any more. That is just shame.

But he left the Fractal dimension, which is the key concept to study price patterns.

Fractal Wave For Forex Machine learning in trading: a trading strategy based

Young Ho Seo 2019.11.30 15:19 #278

Icham Aidibe:

Wait wait wait ...

<Deleted>

But you have to understand someone doing ground breaking work like me. :)

What is the point of saying the Elliott wave pattern provide me success rate of 67.5% when the Elliott wave pattern does not have the definite structure and every one is drawing differnet and his own Elliott wave pattern in their head.

What I am saying is for anything to go on measuing or quantificaiton, we need to have some definition or definite structure. It is the first step.

It is the same for the price pattern like Triangle and wedge stuffs.

The price patterns are too loosly structured in picture and sometimes they are almost scribbled by trader.

X3 pattern framework is kind of prilimiary work for someone wanting to meature their outcome and quantify them precisely.

This is probabaly the first but essential step.

Without this sort of ground breaking work, we can not really go next stage of science in this cycle:

hypothesis --> method definition --> experiment --> results --> acceptance or rejection of the hypothesis --> conclusion

At least with X3 pattern framework, you can not cheat with it because it provides the structure of pattern without any grey areas.

The focus of my work is to get rid of all subjectivity and the grey areas in the price pattern trading.

This work was not done over the night but in many years because you can not just create your own world which other people does not understand.

You have to read, to practice, and to write until you have some clear picture of it. That takes really long. :)

At least some of you grabbed its application really fast. I would say it is not bad at least.

A tonne of Indicators Follow The Bouncing Pip Using OpenCV to recognize

Chris70 2019.11.30 15:34 #279

Young Ho Seo: ...

I won't discredit the work of Mandelbrot. Especially not all that stuff about fractal geometry in nature. Who am I to do that? That guy was a genius.

But let's also mention what his main merits for the financial world were: his main contribution for decades was the discovery that the probability distribution of price movements can best be described as an alpha 1.7 Levy stable distribution instead of a normal distribution, which was to be expected if the random walk theory (Brownian motion, coin toss probabilities...) was entirely true. This was a quite revolutionary statement at the time.

His theories about fractals in finance - as you say - were part of his late work and I don't think they allow final conclusions, yet. I accept them as, I quote, a "concept to study", though. A theory, not some kind of financial law of nature. Any other conclusions would go beyond beyond what Mandelbrot said himself.

The distribution curves that you link at the bottom look pretty random to me. The problem probably comes with not enough data, as daily prices were used --> how many fractals(! not prices) will you see per year and what does that say about the accuracy of the distribution curve? The internet is full of other examples with perfectly smooth price retracement distribution curves, without the humps. Yes, I'm aware that pure price and fractals derived from price are not exactly the same, but we should at least see some irregularities. By the way: I tried this experiment a while a go, because I was curious myself.

I'm not going to convince you. That's okay.

Also, I'm not sure how indepent your opinion can be, because - as according to your profil you sell several books about the topic - there might also be a personal fincancial interest involved (be it on purpose or unconsciously).

"You have to read, to practice, ..." .... which I did for the last 12 years, so your assumption isn't only wrong, but also unnecessarily personal. There is no need to assume I'm less informed only because I don't share your theory.

Nevertheless, it's for sure an interesting topic, but let's keep it a bit more about neural networks and machine learning here or discuss fractal theory in a new thread, if that's okay with you.

______

@Icham Aidibe while I appreciate the joke... SERIOUSLY !!???? btw --> we now can clarify that the translate button works with latin, too --> Machine Learning rules ;-)

Geometric approach in price Random Flow Theory and A big truth about

Icham Aidibe 2019.11.30 18:05 #280

Chris70:

I won't discredit the work of Mandelbrot. Especially not all that stuff about fractal geometry in nature. Who am I to do that? That guy was a genius.

But let's also mention what his main merits for the financial world were: his main contribution for decades was the discovery that the probability distribution of price movements can best be described as an alpha 1.7 Levy stable distribution instead of a normal distribution, which was to be expected if the random walk theory (Brownian motion, coin toss probabilities...) was entirely true. This was a quite revolutionary statement at the time.

His theories about fractals in finance - as you say - were part of his late work and I don't think they allow final conclusions, yet. I accept them as, I quote, a "concept to study", though. A theory, not some kind of financial law of nature. Any other conclusions would go beyond beyond what Mandelbrot said himself.

The distribution curves that you link at the bottom look pretty random to me. The problem probably comes with not enough data, as daily prices were used --> how many fractals(! not prices) will you see per year and what does that say about the accuracy of the distribution curve? The internet is full of other examples with perfectly smooth price retracement distribution curves, without the humps. Yes, I'm aware that pure price and fractals derived from price are not exactly the same, but we should at least see some irregularities. By the way: I tried this experiment a while a go, because I was curious myself.

I'm not going to convince you. That's okay.

Also, I'm not sure how indepent your opinion can be, because - as according to your profil you sell several books about the topic - there might also be a personal fincancial interest involved (be it on purpose or unconsciously).

"You have to read, to practice, ..." .... which I did for the last 12 years, so your assumption isn't only wrong, but also unnecessarily personal. There is no need to assume I'm less informed only because I don't share your theory.

Nevertheless, it's for sure an interesting topic, but let's keep it a bit more about neural networks and machine learning here or discuss fractal theory in a new thread, if that's okay with you.

______

@Icham Aidibe while I appreciate the joke... SERIOUSLY !!???? btw --> we now can clarify that the translate button works with latin, too --> Machine Learning rules ;-)

Sure man .. are you shocked that I need some results to give some credits to a theory that is yours but which could be anyone else's one ?

Don't you think the less you can expect from a tool here around is mathematical coherence, a number ?

I examine that picture which is lorem ipsum-free less than a minute and I can directly benefit from, that's how Elliott gains in credibility as a financial theorist.

You want to say no ? Okay! But then bring something consistent enough to replace Elliott !

Requests & Ideas From theory to practice Breaking through the morning

Taking Neural Networks to the next level - page 28