Discussion of article "Advanced resampling and selection of CatBoost models by brute-force method" - page 14
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Karoch I don't know, maybe I have a wrong gmm ))) But I don't see the difference between with it and without it, in my opinion everything is decided by the target and nothing else....
I have 60k data in total.
I take the first 10k and randomly select 500 points.
I either train the model on them immediately or train the gmm and then train the model.
test on the remaining 50k
And even in the usual way you can find such models as with gmm , and with the same frequency they are genetised.
for example
model without gmm is trained on 500 points , test on 50k
=================================================================================================
Saw an interesting thing to think about.....
There is such a point of view that the market should be divided into states and trade in each state a different strategy, but all known to me attempts were unsuccessful, either the state does not see or the model trades badly even in "kind of one" state.
But in this approach, you can see quite clearly which market the model "likes". and which ones it doesn't.
Probably because of the returns from the mashka as signs, the model works better in flat.
You can manually split into states and slip those periods into the traine. You need to balance examples by 'states', or make artificial ones via gmm
Yes, you can do HMM by state , but it will all be recognised by a sliding window , so with a lag on the window size , so ...... )
I just saw that there is a real clear view of the states , it seemed interesting .
Yes, you can do HMM by states , but it will all be recognised by a sliding window , and therefore with a lag on the window size , and therefore ...... )
I just saw that you can see the states really clearly here, it seemed interesting.
. How come, I never got results like that on a naked model. Maybe a few Mashkas can do it.
It's been with the gmm, I've been trying different things, this and that.
I have an obsession with creating a training sample by optimising distributions or functions.
Without using any sample at all, just generate "something" and test it on real data.
But I don't know how to realise it yet
=====================================================
I also have an idea to improve the quality by removing bad trees from the model, this may also help.
I have an obsession with creating a training sample by optimising distributions or functions.
Then without starting from any sample at all, just generate "something" and test it on real data.
But I don't know how to realise it yet.
=====================================================
I also have an idea to improve the quality by removing bad trees from the model, that might help too.
It's a curious approach. For balancing the classes. Could be played up for our purposes. It just came to me.
https://towardsdatascience.com/augmenting-categorical-datasets-with-synthetic-data-for-machine-learning-a25095d6d7c8
I tried to integrate this approach into the clusteriser from the article, but not as a class balancing method, but as a generator of a new balanced dataset.
There's a great method there for calculating the Mahalanobis distance between two one-dimensional arrays. The article says it' s a multivariate generalisation of how many standard deviations a sample is from the mean of the distribution
Haven't fully experienced this metric yet, but the author suggests using it to assess whether the generated features belong to a particular class .
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.mahalanobis.html
To calculate this indicator, we need 2 univariate arrays and a covariance matrix.
In our case, the first array is the generated feature, the second array is the average distributions of features from GMM. The covariance matrix is also taken from GMM. GMM is prepared for each class separately. Also generated, mean, standard deviations for each trait and labels. These are needed for generating new data.
Everything is ready for generating and selecting new data. Below randomly on the basis of the mean and deviation, generated features for each class in the number of more than 60 times than specified. This is necessary to have something to choose from. And the labels are brought to the state 0 -1.
After for each sample the Mahalanobis distance index is calculated , with respect to the arrays of mean distributions of GMM for both classes. we get an array of 2 values that show the proximity of the generated sample to both classes. the minimum value will show which one. If the label coincides with it, we add it to the training sample. And when the sample fills up to the set value, we move to the next class. This way we get a perfectly balanced sample.
But it does not cancel the tambourine dancing and complicated relations with randomness. But if you try hard enough, you can get a normal result:
If I have time and energy, I will try to sow the distributions of features from 25 to 75 quantiles in the generator, maybe it will give something.
I also tried to use the distance indicator to evaluate the choice of target features. The idea was that with correctly selected labels and targets, the average value of this indicator will decrease.
I ran all available "successful" combinations of target and features and also reproduced "unsuccessful" combinations. With such a cursory analysis, the indicator decreases for successful and increases for unsuccessful variants. There may be some correlation, but you have to check. If you have any grid scanner release or GA, you can check it out
Not bad, you, so caught my eye. I tried to integrate this approach into the clustering engine from the article, only not as a class balancing method, but as a generator of a new balanced dataset.
I ran all the "successful" combinations of target and attributes I had, as well as reproduced "unsuccessful" combinations. In this cursory analysis, the index decreases for successful and increases for unsuccessful variants. There may be some correlation, but you have to check. If you have any grid scanner release or GA, you can check it out
No scanner yet. Great, will have to take a close look. I've been gathering information so far about additional approaches that can improve the model (besides coders). I'll probably formalise an article soon.
No scanner yet. Great, will have to take a close look. In the meantime, I've been gathering information on additional approaches that can improve the model (besides coders). I'll probably formalise an article soon.
To your aforementioned combining successful models in the search process, I have tried combining successful models with different attributes. This technique evens out the drawdown in some parts of the history. It was also noticed that adding models with R^2 from 0.65 improves the results, even if there are models with R^2 0.85-0.95.
In addition to the above-mentioned combining of successful patterns in the search process, I tried combining successful patterns with different attributes. This technique evens out the drawdown in some parts of the history. It was also noticed that adding models with R^2 from 0.65 improves the results, even if there are models with R^2 0.85-0.95.
Yes, but often at the expense of reducing the number of trades by 10-20%.