Machine learning in trading: theory, models, practice and algo-trading - page 170

 
Alexey Burnakov:

Once again, I also have dozens of models and I also look for predictors and parameters. And these models are in the solid gains over a period of 8 years each! And that's the test period. But when the "best" models selected by the test are tested by deferred sampling, there are surprises. And this is called a model fit for crossvalidation.

For example, you performed validation on a deferred sample. Suppose the model on the deferred data merged. What do you do in this case? If you start to tweak something in the parameters to successfully pass the validation on the pending sample - you essentially include data from the pending sample in your crossvalidation, and the crossvalidation becomes adjusted too. This can be corrected by adding a new deferred sample. But what if the model fails on it too? Should we select parameters in order to pass a new delayed sampling? It's an endless race.

Including pending selection in crossvalidation and creating a new pending selection is not a solution, but infinite repetition until you get lucky and the model accidentally passes the pending validation. Then you can stop, but this is not a forex solution, but just luck smiled on you, which however statistically would be a drain.

So the question - let's say the model on the pending data has plummeted. What do you do in this case?

 
Dr.Trader:

So the question is, let's say the model on the pending data has merged. What do you do in this case?

I thought that question was too personal :)

Better - if your model fails the test on the pending sample, do you include the pending sample in the crossvalidation, and create a new pending sample? Or do you do it differently?


For example, I took as a basis what San Sanych repeated 1000 times already - you need to assess the quality of predictors. "Quality" is an elastic term. I, for example, do it through such crossvalidation and it is more a selection of predictors than model parameters. If the model finds the same dependencies in the data during training in any area - this is a strong argument that the predictors are ok.
I don't like in my approach that I have to trade an ensemble. If the dependencies are constant, then just train one model on the selected predictors at the end and it will find the same dependencies again and be able to trade on its own. But I have something missing in the selection of predictors, one model fails.

 
Dr.Trader: quality of predictors....
From the primitive example, it follows that the properties of the target are just as important...
 
I will say even more. The target function is a VERY important point when building a model, because it is the target function that will help the algorithm to make a split. I faced this problem more than once, but it was not completely solved, so I left the target in the form of 50 pips. Just above the level of the spread...
 
Alexey Burnakov:

Once again, I also have dozens of models, and I'm also tinkering with predictors and parameters. And these models are going into solid gains for a period of 8 years each! And that's the test period. But when the "best" models selected by the test are tested by deferred sampling, there are surprises. And it's called - Model fit crossvalidation.

When it's clear, pure experimentation continues. If it's not clear, you'll see a multiple drop in quality in the real world. Which is what you see 99% of the time.

All this happens because the market goes against its own statistics mega often...

1) First I'll show you why I think so and prove it

2) Then I'll explain why it happens, the mechanics of the process

Give me a couple of hours...

I will not give any ready-made solutions, because I myself do not have, but the very understanding of the process, this is something...

1)

==================================================================

first thing I did was to train two deep networks with probabilistic outputs, in fact, any network will do, the main thing is that the output of the network is not a clear answer of the class - "1", "0" That is the output will be for example "0,13" and it will mean that the current data belongs to the class "1" with probability "0,13%".

One network I have trained exclusively for buy, the second exclusively for sell.

The signal (target pound) for sell is a point from which there was at least a 0 .2% drop from the price,


й

The target looks like "00000000001001000000" where "1" is a U-turn to the sell and "0" is not a U-turn.

For an upwards reversal everything is the same, respectively ...

All prices ofthe last three candlesticks were taken as predictors and all possible combinations of the differences between them were made

So, the nets are trained, we take the network predictions (their outputs) and draw charts below the price. The outputs of two networks for buy and for sell are plotted below. See FIG. 1 , the same for the Buy.

Green color indicates an exit to buy, and red indicates an exit to sell.

ф

If you look closely at FIG. When the probability of the reversal upwards is higher than the probability of the reversal downwards (green chart is higher than red) the price always falls, though we actually taught the network the contrary, so let's assume that the market goes against its own statistics. First of all let's build cumulative sums of buy and sell outputs

Cumsum(buy.neural); Cumsum (sell.neural)

я

Fig. 3

Now let's build the difference between the cumulative sum of the buy network and the cumulative sum of the sell network

Cumsum(buy.neural)- Cumsum (sell.neural)


ц

As can be seen in FIG. 4 from the blue chart and price chart, the price is completely inversely correlated with the forecasts of networks (blue chart). To make it even clearer, I will change the sign (invert) the blue chart.

Cumsum (buy.neural)- Cumsum (sell.neural)/ -1

к

Looking at FIG. 5 there is no doubt, the price goes against the networks forecasts and it is interesting to conclude that with the help of neural networks and their statistical probabilities we were able to fully reconstruct the price only based on the probability knowledge if the reversal will take place at the next candle or not.

And all this is very cool, but in fact it is useless because our blue chart has no predictive ability, it is not ahead of the price but keeps up with it, i.e. in fact there is no difference in looking at the price or the synchart, but the market mechanics is clear, which sounds like this- "if the probability of down-turn is higher than the probability of up-turn, the price will go up"...

=========================================================================

Moving on....

I was training hidden markov models SMM or HMM - hidden markov model

It's a probabilistic model applied especially for non-stationary data, they say for markets too...

One model identifies the up trend and gives a probabilistic estimate, and the other model shows the probability of a downward trend.

н

Don't pay attention to the trades, I was experimenting there...

so below we have two vectors with the probabilities of an up trend in green and a down trend in red , the black line is just peak probabilities that the model gives out, it's just the standard deviation, it's easier Bollinger

So look - when the model starts to produce peak probabilities of some event (going beyond the black line downwards), it does the opposite...
So here, too, we essentially got a price move against our own statistics...

And now let's think if the market is such a beast) with such behavior, can MO algorithms predict it - the market? if in fact the RF, the network, the SMM, etc... make their predictions statistically.... one way or another

Basicallythis is the answer why model breaks practically on the second day after its optimization, even if it (optimization) is three times genetic and four times procrosvalidated...

What to do? I don't know yet

 
mytarmailS:

1)

==================================================================

The first thing I did was to train two deep networks with probabilistic outputs, in fact, any network will do, the main thing that the output of the network is not a clear answer of the class - "1", "0" That is the output will be for example "0,13" and it will mean that the current data belongs to the class "1" with probability "0,13%".

One network I have trained exclusively for buy, the second exclusively for sell.

The signal (target pound) for sell is a point from which there was at least a 0 .2% drop from the price,


The target looks like "00000000001001000000" where "1" is a U-turn to the sell and "0" is not a U-turn.

For an upwards reversal everything is the same, respectively ...

All prices ofthe last three candlesticks were taken as predictors and all possible combinations of the differences between them were made

So, the nets are trained, and we take the network predictions (their outputs) and draw the graphs under the price.

If you look closely at FIG. 2 you can see that the price graph goes against these probabilities.

Looking at FIG. Looking at FIG. 5 we have no doubts, the price is moving against the network expectations and it's interesting, using neural networks and their statistical probabilities we were able to fully reconstruct the price just based on the probability of knowing if the reversal will be at the next candle or not.

Intelligent people develop and train neural networks of all kinds, but still do not see simple things. I read your post and was quite surprised. If I understood it correctly, you, roughly speaking, found all the price falls by 0.2% after some high, then took three candlesticks near that high and performed some manipulation with their prices, bringing them to a certain probability with the help of a neural network. But excuse me, don't you think this approach is too primitive? :) You are digging in the wrong place. That's why the result is just the opposite of reality. I would characterize your approach this way: you are trying to take 3 pixels out of a FullHD picture and use those 3 pixels to get an idea of the whole picture. Okay, not the whole picture, but what is the probability of correctly predicting at least 10% of the image area? I hope my example is clear. You don't have to look at the pixels to see the picture. In other words, you don't have to look at individual bars to understand the graph, you have to look at the whole graph. And the solution to the problem lies more in the realm of geometry than in algebra or physics or biology, for example. Although, when I read some of the research that people do here, I get a strong feeling that they are trying to understand human structure through geography. :)
 

BlackTomcat:
1) I read your post and was quite surprised. If I understood it correctly,

2) by bringing them to a certain probability with the help of a neural network. But, excuse me, don't you think this approach is too primitive yourself? :) You are digging in all the wrong places. That's why the result is just the opposite of reality.

3) I would characterize your approach as follows: you are trying to take 3 pixels out of a FullHD picture, and based on those three pixels you can get an idea of the whole picture. Okay, not the whole picture, but what is the probability of correctly predicting at least 10% of the image area? I hope my example is clear. You don't have to look at the pixels to see the picture.

4) In other words, you don't need to look at individual bars to understand the graph, you need to look at the whole graph. And the solution to the problem lies more in the realm of geometry than in algebra, physics, or biology, for example. Although, when I read some of the research that people do here, I get a strong feeling that they are trying to understand human structure through geography. :)

1) Right...

2) ok, but then why are the probabilities opposite, it's basically supposed to be random, not inverse correlation

3) I agree, we need to get the most information in the most compressed form, that's why I've been talking lately about the volume profile, or some alternatives...

Do you have any suggestions on how to present the network data? please share, this is why we are all here

I'm absolutely agreeing with you, I've been puzzling over how to do it, e.g. I need to memorize all levels that fall within the range of the current price, how do I do that? How do I feed the levels into a grid?

p.s. I beg you not to quote my entire post, just a few words are enough to understand that you're addressing me, please delete unnecessary information

 
BlackTomcat:
Smart people, they develop and train all sorts of neural networks, but still do not see simple things. I read your post and was quite surprised. If I understood it correctly, you, roughly speaking, found all the price falls of 0,2% after some high, then took three candlesticks near that high and performed some manipulations with their prices, finally leading them to a certain probability with the help of a neural network. But excuse me, don't you think this approach is too primitive? :) You are digging in the wrong place. That's why the result is just the opposite of reality. I would characterize your approach as follows: you are trying to take 3 pixels out of a FullHD picture and use those 3 pixels to make an idea of the whole picture. Okay, not the whole picture, but what is the probability of correctly predicting at least 10% of the image area? I hope my example is clear. You don't have to look at the pixels to see the picture. In other words, you don't have to look at individual bars to understand the graph, you have to look at the whole graph. And the solution to the problem lies more in the realm of geometry than in algebra or physics or biology, for example. Although, when I read some of the research that people do here, I get a strong feeling that they are trying to understand human structure through geography. :)

I agree. You have to look at the whole picture.

But that's only good for a static picture. That is, we can conditionally divide the whole picture into 100 parts, learn from 70 parts, and get excellent predictive abilities from 30. That's roughly what we do with forecasting in the market. So what's the problem? Why are there problems already in real time?

The catch is that the picture is not static. It is a movie. Naturally, having studied and learned to make predictions in one of the frames of a film, it is useless to predict the neighboring areas of the picture in real life - the next frame is already different! And none of the frames in this movie has absolute copies, and even if similar frames are found in the future as in the past, those frames are followed by others, not like those which followed a similar frame in the past. This is the problem.

Thus, if you look at the individual frames of the film, you can even conclude that the frames are random, just as many are convinced that the market is very random if not 100%. But we know that when we watch a movie, it makes sense, we can even easily predict what will happen at the end of the movie! So what's the point? - Maybe the point is that we have to look wider, we need to investigate more global patterns, which never change - for example I once checked how many % on average the price goes back, and well, it turned out something like 30% (if memory serves me correctly), but the point is that this figure is almost the same for all TFs and for all instruments (currency pairs and metals, for CFD and others I haven't checked, but it seems to be the same)! This is amazing. It is precisely these constant patterns that should be used, but it is often easier to do it without any neural networks, scaffolding, etc., because in order to use MO you need to be able to determine the meaning in the film, and it is not so easy, if at all possible.

 
Andrey Dik:

And the catch is that the picture is not static. It's a movie.

Here's another analogy.

Almost everyone has a smart keyboard on their smartphone. If you type a word, the keyboard suggests the next word. It depends on the word and the previous words typed. I tried it, you can even type quite meaningful text from the words offered by the keyboard. Words are patterns, a group of words is a group of patterns.

But this technology will be powerless in the market, as well as the MOs considered here, because in the market "words" change over time (the order and combination of individual letters), and the meaning of individual "words" changes. Only a certain higher meaning of the whole text remains, which of course is not available to us.

Now people will ask me: What do we do now? - I do not know what to do with MO, the result will still suck.

Or even someone will say, "You just don't know how to cook MO!" - Probably yes, I don't know how. But who knows how? Who has been able to use MO in the marketplace? Does anyone know any such successful examples? Yeah, now they're going to give the example of Batter, but he too has flopped in the ensuing time...

 
mytarmailS:

2) ok, but then why is the probability opposite, in fact there should be a stupid random, not an inverse correlation

Do you have any suggestions on how to represent network data? please share, that's why we're all here

I apologize for such a huge quote, but I'm writing from my phone right now, and editing options are limited here. You can start mashing out a quote and then not be able to get back to a clean field for your text. On a PC this is easy to fix, but on a phone it will be a problem.
On point 2 - I agree with you that you should get a complete random, but I'm not really sure that you have the period of the forward test where you got the inverse result follows immediately after the period where the training was done. Do you have a time gap between these periods? Usually the regularity (if it was in the market) stops working gradually: the balance graph in the tester decreases its slope first, and then it falls down. The regularity is exhausted, it is recognized and a lot of people start to exploit it. Because of this, it turns into an inverse pattern. However, if there was a logical (market) rationale behind the pattern, then after a while it may start working again. But the following seems fair to me: the longer the pattern has been working before, the longer the period of "oblivion" will be. But I haven't checked this thoroughly yet.
I don't work with neural networks, so I have no idea how to prepare data for their training. Graphical(geometric) methods are well recognized by eye, but they are difficult to formalize. Right now I'm working on a TS that uses graphical methods. In my opinion, if there are any working regularities, it's here.
I would like to make some more comments to my previous post. There I seemed to have toughened up my analysis of individual bars. But in fact, it is not so. The analysis of individual bars has the right to exist, but these key bars usually do not lie in the area of the tops.
Reason: