Machine learning in trading: theory, models, practice and algo-trading - page 135

 
Andrey Dik:
For how long?

I understand that your code in the article is already suitable for simulating trading on found points and it can be used. ) That's great.

Well, let's say in a couple of days of machine operation (or a week). Some foreseeable time frame, let's say.

 
Alexey Burnakov:

I understand that your code in the article is already suitable for simulating trading on found points and it can be used. ) That's great.

Well, let's say in a couple of days of machine operation.

The algorithm itself works and doesn't require (if I'm not too picky) additional work, while the code fragment with perfect zz can be used as a trading simulation with some additional work.

In a couple of days? - No, I think you can get good results much faster, in hours or even minutes. In the article the search went something like a few seconds, but there were 100 bars. In your case it will certainly take longer. You can experimentally adjust the number of epochs so that you can obtain the result with the given accuracy and within the specified time.

 
Andrey Dik:

The algorithm itself works and does not require (if not picky) revision, and here's a section of code with perfect zz can basically be used as a simulation of trading with a little modification.

In a couple of days? - No, I think you can get good results much faster, in hours or even minutes. In the article the search went something like a few seconds, but there were 100 bars. In your case it will certainly take longer. You can experimentally adjust the number of epochs so as to obtain a result with the desired accuracy, within the specified time.

So. Thank you. I'll keep trying. I want to generate inputs for minute bars on the entire history. And I'm going to use them in my experiment.

Nice work you've done. And the fact that ideal entries do not necessarily coincide with ZZ logic is a non-trivial and important conclusion.

 
Alexey Burnakov:

Okay. Thank you. I'm going to try it. I want to generate inputs for the minute bars on the whole history. And I'm going to use them in my experiment.

Nice work you've done. And the fact that ideal inputs do not necessarily coincide with ZZ logic is a non-trivial and important conclusion.

And thank you. Few people paid attention to the highlighted in bold, for some reason...

I would like to add. The article shows optimization of SPE considering the average spread of an instrument, but now I'm inclined, or rather sure, to believe that the optimization should be done without spread, while the test trade runs should be carried out with the spread.

 
Andrey Dik:

Thank you, too. Few people paid attention to the boldface, for some reason...

I would like to add. The optimization of WPI in the article takes into account the average spread of the instrument, but now I'm inclined, or rather sure, to think that the optimization should be done without spread, and the test trade runs should be calculated with the spread.

For optimization "on every bar" the spread has to be taken into account, of course. Otherwise it will be a deal on every bar in the direction of the next open price. The spread makes the task non-linear and defines the optimal deal configuration.
 
Alexey Burnakov:

We will consider the LSTM later.

For now, my colleague and I have arrived at R^2 0.2 on the test. A few convolutional filters and a few neurons in a fully connected layer. The idea is that recurrence is not needed there. What we need is proper extraction of features.

So far results for my problem are the following (all R^2 estimates on test set):

ARIMA: 0.14

MLP (fully connected NN): 0.12-0.15

GBM: 0.1

Convolutional Net (simple, not developed well): at least 0.2

Thus, the simulated dependence really turned out to be not so simple and popular methods fail. Let us improve the convolutional network.

If someone has time to try to solve the problem (with some recurrent network), please share the result.

 
Alexey Burnakov:

So far the results for my problem are as follows (all R^2 estimates on the test set):

ARIMA: 0.14

MLP (fully connected NN): 0.12-0.15

GBM: 0.1

Convolutional Net (simple, not developed well): at least 0.2

Thus, the simulated dependence really turned out to be not so simple and popular methods fail. Let us improve the convolutional network.

If anyone has time to try to solve the problem (with some recurrent network), please share the result.

I also worked on my problem. I also used convolutional NS with fully connected layers on top.

The best result I got with R^2 on the test was 0.23.

It seems to be the ceiling. Increasing the complexity of the network does not give anything anymore. But the output of the network is not perfect. This is a dot plot of the response against the model. A narrow slanted cloud is expected. In fact, you can see that the complex response function is not fully modeled by the network (jumps are not recognized). The function produced by the network is much smoother than I laid out.


Maybe people familiar with NS will have some thoughts on how such a complex function can be modeled (example):

Increasing the number of layers, neurons?

In fact, without preparing input variables, all popular methods have merged. Convert really can potentially get the right features (by integration, differentiation, nonlinear smoothing) and the NS is already trained normally on them. That is the power of convolution.

By the way, Mr. Perervenko did not say anything about this type of networks in his article on neural networks. Well, I found only one mention in the entire article. And you could have disclosed the issue of applicability to time series (thoughtfully).

Alexey

 

The main thing is to give more inputs.

And training examples.

 
Vadim Shishkin:

The main thing is to give more inputs.

And training examples.

This is enough.
 

The more neurons in the hidden layer - the more complex function can be described by the neuron, you need more hidden layers and neurons in them.

But then the problem will be that the neuron uses consecutive additions and multiplications (and for example sigmoids for the activation function) to describe the target, that is obviously not your original function, but some kind of approximation. And it may well turn out that this approximation will remember some peculiarities in training data, which will not work correctly on new data. So sometimes you need to stop training, see if the error on the test sample has decreased, and continue training if all is well. At some point the error on the test data will start to grow, then the training must be stopped completely.

Also, the output of neuronics is limited by the activation function. For popular ones - sigmoid is (0;1), relu is [0;inf). Target values need to be scaled to another interval, your outputs in the interval (-7;7) are simply unattainable for many packages.

Reason: