Machine learning in trading: theory, models, practice and algo-trading - page 942

 
Olga Shelemey:
Thank you. He will read it later. Right now he's asleep, reading some Shelepin. He said not to bother him.

Ok.

Let him find Tuckens theorem.

 
Aleksey Vyazmikin:

I put everything into one table according to this principle

I also made a grouping by arr_TimeH predictor - maybe in this form it will be useful for something.

I attach the files.

The program I'm using has the following picture: there are only 30.81% of hits


However, if we add for example errors -2 and -1 to each other and add correctly found solutions and then set them against not correctly found ones and ignore the target number 3 because it is a filter and will not affect financial results, then we will get the following picture

in this case, the error will be 49.19% on entry into the position, which is not so bad!

 
Maxim Dmitrievsky:

Have you experimented with kfold alglib? does it improve the results? Saw in one of your old posts that it doesn't seem to mix the sample. Does it even have to?

any conclusions as to what overtrains less - forest or mlp? on the same data. I have suspicions that for regression tasks the forest works crookedly and returns errors crookedly (very small), for classification norms.

2. I saw you were interested in OpCl, was there any thought of rewriting NS for it? For example, I gave up GA altogether as some nightmare nonsense, everything is now trained in 1 thread on 1 core. You could speed it up on Cl (although it's as fast as that). Or, if you train on Spark, it's already parallelized there and there's no point.

More or less understood what you put on the git and how you use it. Great interesting work, respect! :)

When I started researching networks, the result was https://github.com/Roffild/RoffildLibrary/blob/master/Experts/Roffild/Alglib_MultilayerPerceptron.mq5. I went through different sets of predictors in different sequences (with and without shuffling) - the File_Num parameter is responsible for this. And, of course, I tried putting the same number of records for two classes.

The problem with this network is the lack of a clear criterion for selecting a valid sample. For example, when recognizing fruit pictures, you yourself can clearly determine where an apple and an orange are. With price charts, there is no 100% selection criterion, which means there is no 100% retraining criterion either.

https://github.com/Roffild/RoffildLibrary/blob/master/Experts/Roffild/Alglib_RandomForest.mq5

Random forest is less dependent on noise and more often returns the same result under different sampling conditions. For example, in the graph.

The blue and yellow data are almost identical. Although I expected more of a difference because part of the sample was removed for the second forest.

And in my opinion, some people try to get the opening price of an order using the net or the forest, but they forget that the profit taking takes place when the order is closed. To solve this problem, https://github.com/Roffild/RoffildLibrary/blob/master/Include/Roffild/OrderData.mqh appeared , but this class is used precisely as a "parent".

OpenCL is only needed for training the network. For final calculations where the network or the forest has already been trained, OpenCL is of no use as the time required to transfer the data to the video card is very long. And the algorithms to get the data from the network or forest are really very simple and the CPU handles it very well.

Spark in general parallels the computations not only between the cores of a single computer, but can also use a whole network of computers. It is a standard for inter-server computing. For example, I usually buy 32 cores on Amazon for 0.25 bucks/hour to quickly get a finished random forest.

 
Roffild:

The problem with this network is that there is no clear criterion for selecting a valid sample. For example, when recognizing pictures of fruit, you can clearly determine which is an apple and which is an orange. With price charts, there is no 100% selection criterion, which means there is no 100% retraining criterion either.

https://github.com/Roffild/RoffildLibrary/blob/master/Experts/Roffild/Alglib_RandomForest.mq5

A random forest is less dependent on noise and more likely to return the same result under different sampling conditions. For example, in the graph

that's because in NS you have to pick up architecture, while forests always work the same way, yes :)

And to pick up architecture you have to display multidimensional feature space and understand what layer should be responsible for what, or just by gut feeling. But correctly picked up NS, in theory, should give better results and in terms of overfit, too

i haven't seen all libraries yet, thanks, i'll look into it further

 
Nightmare
 
The farther into the woods, the bigger the firewood...
 

Another useful book in Russian.

Джулли А.,Пал С. - Библиотека Keras - инструмент глубокого обучения [2018, PDF, RUS] :: RuTracker.org
  • rutracker.org
Автор : 2018 : Джулли А.,Пал С.: ДМК Пресс : 978-5-97060-573-8 : Русский: PDF : Отсканированные страницы : 298: Книга представляет собой краткое, но обстоятельное введение в современные нейронные сети, искусственный интеллект и технологии глубокого обучения. В ней представлено более 20 работоспособных нейронных сетей, написанных на языке...
 
Vladimir Perervenko:

Another useful book in Russian.

How do you find Keras? Is it better than Darch or is it the same? Can it learn faster with the same data, number of epochs, etc.?
 
elibrarius:
How did Keras seem to you? Is it better than Darch or are the capabilities the same? Can it learn faster with the same data, number of epochs, etc.?

No comparison. Kegas - limitless possibilities in structure, training and customization, a lot of examples and detailed documentation. Separately, about TensorFlow - it is developing very fast (already 1.8). It is clear that this has its own advantages and disadvantages. It is not fast to learn, you have to do some extra exercises. It is difficult to optimize hyperparameters. Otherwise it is the main direction to go in the future.

Good luck

 
Aleksey Vyazmikin:

I still haven't made friends with R, so I'm interested to see what you come up with!

I made a decomposition of weekly TF, 1400 bars (almost all available history in the terminal)

The dates are not displayed here, so it's not very convenient. I will have to rewrite it through Plot or an indicator to mark it on a chart.

I have more pronounced tslichnosti on small mods. And the largest one is +- 14 years (2 half-periods from 28 years), which is divided into 4 7-year cycles (as I said). And, the last 7-year cycle ended early this year (roughly), which suggests that it doesn't make much sense to teach the grid on earlier dates

There are not as pronounced cycles in the middle.

And not to rack your brains, you just need to enter all the mods in the NS, besides they do not correlate

Then it will recognize the different periodicities, and maybe not, a philosophical question, as you do so will be :)

Reason: