Machine learning in trading: theory, models, practice and algo-trading - page 377

 
Vladimir Perervenko:

After splitting into train/test/valid, shuffle the train. Do not shuffle the rest of the sets.
This is valid for classification by neural networks. Moreover for deep neural network training mix up every minibatch before feeding it in.

Good luck


Can you give me a link where to read about mixing? Because purely intuitively it makes no sense ) as well as predictors correlate with the target (with this we solved it, barely at all)
 
Vladimir Perervenko:

After splitting into train/test/valid, shuffle the train. Do not shuffle the rest of the sets.
This is valid for classification by neural networks. Moreover for deep neural network training shuffle every minibatch before feeding the neural network.

Good luck

I found an example of mixing train and valid with each other in ALGLIB ensemble calculation function. Apparently, it is one of methods.

I shuffled only train.

Average error in training (80.0%) =0.396 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6
Mean error on validation (20.0%) plot =0.391 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6
Full plot (training + validation):
Average learning error=0.395 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6
Average error in the test (20%) section =0.398 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6

Error is the same at all segments, the same as when mixing train and valid. Apparently it has the same effect.

 
elibrarius:

I found an example of mixing train and valid in the ALGLIB ensemble calculation function. Apparently, it is one of the methods.

I have only mixed up train.

Average error in the training (80.0%) plot =0.396 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6
Mean error on validation (20.0%) plot =0.391 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6
Full plot (training + validation):
Average learning error=0.395 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6
Average error on the test (20%) section =0.398 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6

Error is the same at all segments, the same as when mixing train and valid. Apparently the effect is the same.


What is the error on a file separate from these?

 
SanSanych Fomenko:


What is the error on the file separate from these?

On the test one do you mean?

Average error on test (20%) section =0.398 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6

I haven't done the test2 section yet. I will only sift through test1. (Maybe in the future).

 
elibrarius:

Do you mean the test site?

Average error on the test (20%) section =0.398 nLearns=2 NGrad=1208 NHess=0 NCholesky=0 codResp=6

I haven't done the test2 plot yet. I will only sift through test1. (Maybe in the future).


Outside of all these samples
 
SanSanych Fomenko:

Outside of all these samples.
There is no outside, I used all the data.
 
elibrarius:
Out of the box, I used all the data.

Can you divide the source file 80/20? And then 80% of all your exercises, and then 20% without all the mixing.
 
SanSanych Fomenko:

Can you divide the source file 80/20? And then 80% of all your exercises, and then 20% without any mixing.

With mixing:

Average error on training (51.0%) section =0.683 (68.3%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6
Mean error on validation (13.0%) plot =0.685 (68.5%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6
Full plot (training + validation):
Average learning error=0.683 (68.3%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6
Mean error on test (16.0%) plot =0.661 (66.1%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6
Mean error on test plot 2 (20.0%) =0.671 (67.1%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6

Without Stirring

Mean error in the training (51.0%) plot =0.516 (51.6%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6
Mean error on validation (13.0%) plot =0.376 (37.6%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6
Full plot (training + validation):
Average learning error=0.491 (49.1%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6
Mean error on test (16.0%) plot =0.344 (34.4%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6
Average error on test site 2 (20.0%) =0.326 (32.6%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6

Only 2 cycles of retraining, for speed... it's time to sleep)

 
elibrarius:

With shuffling:

Average error on training (51.0%) plot =0.683 (68.3%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6
Mean error on validation (13.0%) plot =0.685 (68.5%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6
Full plot (training + validation):
Average learning error=0.683 (68.3%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6
Mean error on test (16.0%) plot =0.661 (66.1%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6
Mean error on test plot 2 (20.0%) =0.671 (67.1%) nLearns=2 NGrad=725 NHess=0 NCholesky=0 codResp=6

Without stirring.

Mean error in the training (51.0%) plot =0.516 (51.6%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6
Mean error on validation (13.0%) plot =0.376 (37.6%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6
Full plot (training + validation):
Average learning error=0.491 (49.1%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6
Mean error on test (16.0%) plot =0.344 (34.4%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6
Average error on test site 2 (20.0%) =0.326 (32.6%) nLearns=2 NGrad=1063 NHess=0 NCholesky=0 codResp=6

Only 2 cycles of retraining, for speed... it's time for bed)


Your model doesn't learn anything - it's all from the ballpark. Somewhere it picks up something, and then it turns out that it is irrelevant

Start with datamining. Target, then search for predictors that are relevant to the target, then determine the predictive ability of the selected predictors for the specific target, and only then the model


Everything else is an intellectual game of numbers.

 

https://www.youtube.com/channel/UCLk-Oih8VlqF-StidijTUnw

found something to do for the weekend :) ar for nubas

And here's a guy engaged in algotrading even


Основы анализа данных
Основы анализа данных
  • www.youtube.com
Канал содержит курсы по анализу данных. Все курсы бесплатны и легки в освоении Прохожу обучение в бесплатной школе Дениса Коновалова http://superpartnerka.bi...
Reason: