Machine learning in trading: theory, models, practice and algo-trading - page 3281
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Well, you need Pearson.
I'm not sure how to do it, and I'm sleepy.
Something similar.
I'm not sure how to do it, and I'm sleepy.
something similar
Yeah, that's not it.
Right, wrong.
It's almost something, look it up, I'm off.
Trying to quickly find similar short strings in a long string.
It takes more than six seconds for such implementation via Alglib to search for similar short strings (300) in the millionth string.
I accelerated it.
Result.
Now in 300 milliseconds.
Now in 300 milliseconds.
When no matrix can do it.
It takes three seconds to find similar 30K strings in a 10M string.
When no matrix can handle it.
It takes three seconds to find similar 30K strings in a 10M string.
300/1M is not fft, 30K/10M is fft.
When no matrix can handle it.
It takes three seconds to find similar strings of length 30K in a string of 10M.
Impressive result!
I took a sample from 2010 to 2023 (47k lines), divided it into 3 parts in chronological order, and decided to see what would happen if we swap these parts.
The size of subsamples train - 60%, test - 20% and exam - 20%.
I made these combinations (-1) - this is the standard order - chronological. Each sub-sample has its own colour.
Trained 101 models with different Seed for each set of samples, and got the following result
All metrics are standard, and it can be seen that it is difficult to determine the average profit of the models (AVR Profit), as well as the percentage of models whose profit exceeds 3000 points on the last sample that did not participate in training.
Maybe the relative success rate of the -1 and 0 variants in the training sample size should be reduced? In general, it seems that Recall reacts to this.
In your opinion, should the results of such combinations be comparable to each other in our case? Or is the data irretrievably outdated?
I took a sample from 2010 to 2023 (47k lines), divided it into 3 parts in chronological order, and decided to see what would happen if we swap these parts.
The size of subsamples train - 60%, test - 20% and exam - 20%.
I made these combinations (-1) - this is the standard order - chronological. Each sub-sample has its own colour.
Trained 101 models with different Seed for each set of samples, and got the following result
All metrics are standard, and it can be seen that it is difficult to determine the average profit of the models (AVR Profit), as well as the percentage of models whose profit exceeds 3000 points on the last sample that did not participate in training.
Maybe the relative success rate of the -1 and 0 variants in the training sample size should be reduced? In general, it seems that Recall reacts to this.
In your opinion, should the results of such combinations be comparable to each other in our case? Or is the data irretrievably outdated?
Another do-it-yourself...
There is cross validation, everything is chewed and chewed..., widely used....