Machine learning in trading: theory, models, practice and algo-trading - page 393

 
Mihail Marchukajtes:
Hi all!!!! I'm glad that this thread hasn't died out and still is going on, so I have a question for the public. I have a dataset for training, but unfortunately it became so big that training takes too long. Can someone build a model using his experience and then we'll see how it works together!!!!!.

Try to leave the inputs (in order from 0, 0 is the 1st column)

0,4,50,53,59,61,64,92,98,101,104,

Files:
 

I was able to calculate the truncated part of the dataset, here is the result of the network out-of-sample, but I had to mirror the model completely.

This work from 05.29 is quite good, in my opinion.


 
elibrarius:

Try leaving the inputs (in order from 0, 0 is 1st column)

0,4,50,53,59,61,64,92,98,101,104,


The advantage of the optimizer is that it removes unnecessary columns. That's why it takes so long. But now I will try to optimize the full dataset, which I laid out, but taking into account your recommendations and then see what the result will be out of sample. ok?
 
Mihail Marchukajtes:

What's the advantage of optimizer is that it removes unnecessary columns. That's why it takes so long to calculate. But now I will try to optimize full dataset, which I posted, but taking into account your recommendations and then see what the result will be out of sample. ok?


Well great!!! Charged the training. given that the columns are much less, I think will count quickly, as what, the result of the work will lay out :-)

That's right, the first training resulted in 55% generalizability.

 

Strange as it may seem, but with these inputs the model should also be reversed and then it is possible to obtain the following equity in the same area

It is a bit worse, but it has its place, too.

I wish we could run the entire dataset on the optimizer. I think then more columns would have been selected and the level of generalization would have been higher, hence the quality of the network at the non-sampling area...

 
Mihail Marchukajtes:


Well great!!! I have started training, taking into account the fact, that there are much less columns, I think it will count quickly, I will post the result :-)

That's right, the first training resulted in 55% generalizability.

What's the 55% generalizability?
Normal MLP gives 11-5-1:
Average error on training (60.0%) plot =0.057 (5.7%) nLearns=2 NGrad=332 NHess=0 NCholesky=0 codResp=2
Mean error on validation (20.0%) plot =0.038 (3.8%) nLearns=2 NGrad=332 NHess=0 NCholesky=0 codResp=2
Average error in test (20.0%) plot =0.023 (2.3%) nLearns=2 NGrad=332 NHess=0 NCholesky=0 codResp=2
 
elibrarius:
What is the 55% generalizability?
Normal MLP 11-5-1 gives:
Average error on training (60.0%) plot =0.057 (5.7%) nLearns=2 NGrad=332 NHess=0 NCholesky=0 codResp=2
Average error on validation (20.0%) section =0.038 (3.8%) nLearns=2 NGrad=332 NHess=0 NCholesky=0 codResp=2
Average error on test (20.0%) section =0.023 (2.3%) nLearns=2 NGrad=332 NHess=0 NCholesky=0 codResp=2


I do not even know what to answer. The only way is to cite an example from the report. The result of optimization is something like this: how to interpret it is up to everyone, but when the optimization finishes, the result looks like this

* Sensitivity of generalization abiliy: 55.12820512820513%

* Specificity of generalization ability: 55.5045871559633%

* Generalization ability: 55.309734513274336%

* TruePositives: 129

* FalsePositives: 105

* TrueNegatives: 121

* FalseNegatives: 97

* Total patterns in out of samples with statistics: 452

The red highlighted the overall result of the generalizing ability. The first is the percentage of guessing ones, the second is the percentage of guessing zeros, and the third is the total.


 
Mihail Marchukajtes:


I don't even know what to say. Except to give an example from the report. The result of optimization is the following thing; how to interpret it is up to everyone, but when the optimization is done, the result looks like this

* Sensitivity of generalization abiliy: 55.12820512820513%

* Specificity of generalization ability: 55.5045871559633%

* Generalization ability: 55.309734513274336%

* TruePositives: 129

* FalsePositives: 105

* TrueNegatives: 121

* FalseNegatives: 97

* Total patterns in out of samples with statistics: 452

The red highlighted the overall result of the generalizing ability. The first is the percentage of guessing ones, the second is the percentage of guessing zeros, the third is the total.


MLP is guessing 95% of the time... I think you're not making the right bike) No offense.
I'm making my own bike too, but based on decades of proven MLP (which, as they say, is outdated, and needs something cooler to work on). So I'm for bikes, maybe you have a logic error somewhere in your code? I've already found a few in mine, while testing different variants. Including solved the problem from the first post of this thread. But these same filters, which cut off unnecessary things in your problem - cut off necessary things in that one ((. So I think I need to do sifting through weights of inputs, field run on full data.
 
elibrarius:
MLP is guessing 95% of the time... I think you're making the wrong bike) No offense.
I'm making my own bike too, but based on decades of proven MLP (which they say is outdated and needs something cooler to work on). So I'm for bikes, maybe somewhere in your code you have an error in logic? I've already found some, while testing different variants.


The thing is I don't program. This optimizer was not written by me, I just use it, anyway inputs specified by you give 55 percent of generalization, which is better than guessing, hence we see the result outside the sample with a positive profit. The only thing that stops me now is that the model needs to be mirrored, then it gains, and if the model is straight, then it loses...

But if we run the optimizer on all inputs, I think that the model will be much more complex and it will select more inputs. And in theory such model should work better and longer. But I cannot run the optimizer on a full dataset, I think it will take a month to read it. So, my hope is to run the optimizer on the GPU, and then we'll see.

 
Mihail Marchukajtes:


The thing is that I don't program. This optimizer was not written by me, I just use it, anyway inputs specified by you give 55 percent of generalization that is better than guessing, hence we see the result outside the sample with a positive profit. The only thing that stops me now is that the model should be mirrored, then it increases, and if the model is straight, then it plummets.

But if we run the optimizer on all inputs, I think that the model will be much more complex and it will select more inputs. And by theory such model should work better and longer. But I cannot run the optimizer on a full dataset, I think it will take a month to read it. So my hope is to run the optimizer on the GPU, and then we'll see.

If you are going to start something for a month use an uninterrupted power supply, I had the light switch off after about 2 weeks of calculations)).
And don't wait for the GPU, it takes longer to rewrite the code and if the author didn't do it, hardly anyone else will finish the task.
Reason: