Machine learning in trading: theory, models, practice and algo-trading - page 395

 
Dr. Trader:


I was still starting to learn R back then, the script is almost entirely generated in rattle (visual environment for datamining in R), that's why it's so complex and customized for all occasions.


This one

should be changed to

And it should be ok.


In general, there is a bad approach, you should not determine the importance of inputs. For some reason it worked that time, but it never helped me again.

Experimented some more...

If you set 1 - 2 neurons in a hidden layer, the important inputs are several times different:

152.33, 7.82, 132.57, 12.19, 132.86, 10.54, 135.56, 19.16, 137.32, 14.84, 127.36, 7.43, 11.35, 6.66, 13.6, 10.18, 10.74, 10.66, 11.18, 8.95 (1 neuron)

If you set 10 (like in your second experiment), then the weights are smeared out on neurons and you can't distinguish important from noise ones:

113963.27, 91026.57, 100833.22, 134980.44, 154190.05, 146455.03, 198703.01, 135775.2, 184353.78, 160766.79, 152433.73, 105753.11, 151673.83, 135421.64, 165343.94, 70277.93, 175038.87, 150342.56, 59153.02, 121012.76 (10 neurons)

Apparently for that logic problem 1 neuron is optimal.

 
Maxim Dmitrievsky:


And try the decision trees from the alglib, they count faster and the value is better than the mlp. Diplerning also counts faster, but not in alglib.

The main thing is the speed/quality ratio, what's the point of waiting a week or even a day or even an hour for the calculation... you'll never find the optimal combination that way.) The model should learn in a few seconds, then it is possible to use genetics for autosimulation of parameters or predictors, then it is the true AI, otherwise it is rubbish)

With 1st column removed it's not 5%, but much worse...

Forest gives about the same error as MLP (but calculates it faster)

Average error in training plot (60.0%) =0.264 (26.4%) nTrees=100 codResp=1
Average error on validation (20.0%) plot =0.828 (82.8%) nTrees=100 codResp=1
Average error at the test (20.0%) site =0.818 (81.8%) nTrees=100 codResp=1

 
elibrarius:

With the 1st column removed, it is no longer 5%, but much worse...

Forest gives about the same error as MLP (but counts faster)

Average error in training (60.0%) plot =0.264 (26.4%) nTrees=100 codResp=1
Average error on validation (20.0%) plot =0.828 (82.8%) nTrees=100 codResp=1
Average error in the test (20.0%) plot =0.818 (81.8%) nTrees=100 codResp=1


yes, classic mlp has no advantage over forest, at least for me forest always wins in terms of speed and quality

by the way, diplerning may be no advantage either... since in studio neural network is a kind of analog of diplerning - there aren't even layers, but several convolutional networks stand... like autoencoders, which initialize each other in series (at least, so it says), but the result is still worse than forest

 
Mihail Marchukajtes:


I don't even know what to say. Except to give an example from the report. The result of optimization is the following thing; how to interpret it is up to everyone, but when the optimization finishes, the result looks like this

* Sensitivity of generalization abiliy: 55.12820512820513%

* Specificity of generalization ability: 55.5045871559633%

* Generalization ability: 55.309734513274336%

* TruePositives: 129

* FalsePositives: 105

* TrueNegatives: 121

* FalseNegatives: 97

* Total patterns in out of samples with statistics: 452

The red highlighted the overall result of the generalizing ability. The first is the percentage of guessing ones, the second is the percentage of guessing zeros, and the third is the total.



Reshetov's classifier gives better results than mlp at least, and doesn't retrain, that's its advantage... but it takes an awfully long time to count with your set, yesterday I was counting for 2 hours and today I continued again after hibernation... I'm waiting until it finishes counting so I can compare errors :)

Nevertheless, I want to rewrite it on mql5 and run it on OpenCL, for more effective use. Then rent a google cloud and calculate a neural network in minutes (seconds?) on Tesla, or buy the Tesla for 500 000 :) From 3000 cuda cores

 
Maxim Dmitrievsky:


Reshetov classifier gives better results than mlp at least, and does not retrain, that's its advantage... but it takes a long time to count with your set, yesterday I counted 2 hours and today again continued after hibernation... waiting when it completes counting so I can compare errors :)

Nevertheless, I want to rewrite it on mql5 and run it on OpenCL, for more effective use. Then rent a google cloud and calculate a neural network in minutes (seconds?) on Tesla, or buy the Tesla for 500 000 :) From 3000 cuda cores


Well, this is exactly its advantage, that it is not retrained, while the model becomes more complicated every time. This means that we get the most complex (large) model, which is not retrained. So the model gets smarter, like a tapper or something. I was already thinking about Intel's Xeon math coprocessor, but it costs 200k. There are 60 cores 120 logical. Just think about it, how can you build a model in 5 seconds, processing even this as you say is not a big set and get a model adequate to the most complex non-stationary process as kotir currency???? In order to get an adequate model, it is necessary to spend enough machine time. Then the model will be adequate, and will work longer.

I would still like to run it on a GPU. At least 10 times the performance and it would be good.... Maybe it will work?

 
Dr. Trader:

The results of the importance assessment are as follows. The higher the predictor in the table, the better. OnlyVVolum6, VDel1, VVolum9, VQST10 passed the test.

In rattle we can build 6 models at once on these 4 predictors, and SVM shows accuracy of about 55% on validation and test data. Not bad.


Well great, now the optimizer is counting, I don't even know when it will finish, but I will definitely throw him these inputs and see what he gives out, what will this model be like..... Thank you!
 
Maxim Dmitrievsky:


But it takes a hell of a long time to count with your seth, yesterday it took me 2 hours and again today after hibernation.)

Nevertheless, I want to rewrite it on mql5 and run it on OpenCL, for more effective use. Then rent a google cloud and calculate a neural network in minutes (seconds?) on Tesla, or buy the Tesla for 500 000 :) From 3000 cuda cores


Again, how many cores are used in calculations? I have 4 cores loaded at 100% and I didn't dare to run a full set of 452 lines, because I feel that it's for a week, not less.....
 
Mihail Marchukajtes:

Again, how many cores are used in the calculations??? I have 4 cores loaded at 100% and I didn't dare to run a full set of 452 lines because I feel it's a week, not less.....


Anyway, I messed with the latest version of the program, where there is paralleling, but the latest version works differently than the first, there are 2 neural networks in the committee, mlp and author, and they then interact when getting results, a lot of code, respect to the author :) Is there anywhere at all a description of the latest version, theory?

There is something very hard, in particular THIS is used. A lot of time will have to be spent studying the code.

Try to contact the author himself and call him, maybe he will parallelize it himself... since there is a gesture

Метод МГУА для социально экономического прогнозирования, математического моделирования, статистического анализа данных, аналитической оценки систем и программирования.
  • Григорий Ивахненко
  • gmdh.net
Метод Группового Учета Аргументов применяется в самых различных областях для анализа данных и отыскания знаний, прогнозирования и моделирования систем, оптимизации и распознавания образов. Индуктивные алгоритмы МГУА дают уникальную возможность автоматически находить взаимозависимости в данных, выбрать оптимальную структуру модели или сети, и...
 
Mihail Marchukajtes:


Well, this is its advantage, that it is not retrained while the model becomes more complicated each time. That is, we get the most complex (large) model, which is not retrained. So the model gets smarter, kind of like a tap. I was already thinking about Intel's Xeon math coprocessor, but it costs 200k. There are 60 cores 120 logical. Just think about it, how can you build a model in 5 seconds, processing even this as you say is not a big set and get a model adequate to the most complex non-stationary process as kotir currency???? In order to get an adequate model, it is necessary to spend enough machine time. Then the model will be adequate and will work longer.

I would still like to run it on a GPU. At least 10 times the performance and it would be good.... Can it work?

Isn't Reshetov's classifier still a single neuron, not a network? Or are they integrated into Reshetov's network of neurons?
 
Maxim Dmitrievsky:


In short, I dug the latest version of the program, where there is paralleling, but the last version does not work like the first, there are 2 neural networks in the committee, mlp and author, and they then interact when getting results, a lot of code, respect to the author :) Is there anywhere at all a description of the latest version, theory?

There is something very hard, in particular THIS is used. A lot of time will have to spend to study the code.

Try to contact the author himself and call him, maybe he'll parallel it himself... as there's a gesture


I think the author will not be able to get through, I wrote him both ways. He says nothing. But as far as I know, he once wrote that in it all that can be paralleled, it's just his words. Yes indeed he is training two grids that work in the committee. I wrote that in my article. If both show yes, then yes, if not, then no, if they both show in a mismatch, then "I don't know". I do not know about the latest version, but the basic description is on google, the link that you gave me. Once ran version 3 on the VPN server and to my disappointment the optimizer loaded only one core, but recent versions load all the cores evenly, so I think there is still razparalelivanie. There is only one thing left to do. To increase number of cores :-)
Reason: