Machine learning in trading: theory, models, practice and algo-trading - page 2798

 
Uladzimir Izerski #:

I notice now that you've updated the post.

Moreover, I answered your question a long time ago too)))


Uladzimir Izerski #:

You are looking for nothing but errors. This is the problem of MO dreamers. Searching for errors, not for concrete results of forecasting.

You have a very, very narrow view of the concept of "error".

in one case it was an error in forecasting an indicator,

in another case, for example, the error could be the balance sheet curve, or the deviation from some ideal capital,

or the dynamics of capital growth,

or, for example, the number of algorithm errors (the number of algorithm errors is also an error).

or you can look at the algorithm robot with your eyes and tell it (by code/button) - I like this one, but don't do this one, and this can be considered an error too....

And there are millions of different variations, and even what you do, you also have some criterion (good/bad). That's also a mistake.

The fact that you don't realise it, you don't admit it, it's just yours...


A mistake is a good/bad criterion expressed by a number.

 

Uladzimir is again trying with his childishly inquisitive mind to comprehend meanings that are great to him and discredit the participants in this thread, namely those meanings:

Equation Reconstruction:"black box".

"Black box " is both themost difficultand the most"seductive " formulation of a modelling problem,wherethere is noa priori information about the object,andhence the structure of the model. Theintrigue lies in the fact that amodelcapable of reproducing the observed behaviour or giving a forecast of further evolutionshould be obtained only from the observed series,i.e.practically"from nothing".Thechances of success are small,but in case of luck, a"good " model becomes a very valuable tool for characterising the object and understanding the"mechanisms " ofits functioning: "almost a bluff can lead to a big win".Absence of a priori information forces to use universal structures of model equations,for example, artificial neural networks,radial basis functions,algebraic polynomials, etc.Such models often turn out to be multidimensional and contain a lot of unknown parameters.


ZY and more than a year ago he wrote that he himself uses neural networks and that all his TSs are based on them... such a clown that abzdez

#870

#1826

Our dear Uladzimir has learnt neural networks very quickly, in 2 months, it turns out.

I wish he had, but two years earlier he was already getting signals from the NS.

#5758

So the type is talking nonsense in a drunken stupor and lying, lying, lying.... and only discredits himself and his crazy indicators.

 

It turns out to be a pretty good test of the model through window bias. If it gives a lot of lags, instead of zero, with better statistics (like mi), then the model is based on randomisation.

Otherwise it would be unambiguous, there can't be so many profitable TS at once.

I tried to select through std according to Sanych's instructions - approximately the same picture. But I selected sets of traits, not each one separately.

I will look at the traits separately again

 
>>> results = []
>>> for i in range(25):
...     pr = get_prices()
...     pr = labeling_shift(pr,shift=i)
...     mi = mutual_info_classif(pr[pr.columns[1:-1]], pr[pr.columns[-1]])
...     results.append([mi.mean(), mi.std(), mi, i])
...     print(i)
... 
results.sort()
results[-1]    
pd.DataFrame(results)

Best score is lag 9, but std is better on 0:

15  0.002473  0.002008  [0.0, 0.0, 0.0, 7.738132773948969 e-05, 0.00428...   0
16  0.002482  0.002536  [0.0027194272625081783, 0.004082692968791601, ...   4
17  0.002544  0.002137  [0.00016451381901605444, 0.003159073557252867,...   2
18  0.002678  0.002174  [0.0, 0.0015686230398428425, 0.000974887322880...   8
19  0.002749  0.001978  [0.0, 0.001425018820565338, 0.0, 0.0, 0.002788...   1
20  0.002793  0.002378  [0.00535509344523355, 0.0, 0.00400320235670181...  12
21  0.002897  0.002330  [0.00406257691063705, 0.001421806087245736, 0....  13
22  0.003113  0.002501  [0.0, 0.0, 0.004822852461999094, 3.66068989796...  10
23  0.003195  0.002560  [0.0024128008240758003, 0.0, 0.001845732619932...  11
24  0.003255  0.002406  [0.0, 0.0, 0.0034648745240082324, 0.0063568287...   9

Okabe look at backtests of both, 9th:

0-я:

The 0th has, logically, a lower spread of values on the backtest, because the std for mi is initially smaller. But it doesn't affect the results on OOS, neither does the higher value of mi (Mutual information)

Okay, let's say we are retraining on a large number of features (14 features here).

MA_PERIODS = [i for i in range(10, 150, 10)]

Let's look at the statistics of each feature and pick only the best ones for the model with 9 lags:

>>> results[24]
[0.003255328338368026, 0.002405621052220332, array([0.        , 0.        , 0.00346487, 0.00635683, 0.00490859,
       0.        , 0.00305732, 0.00268664, 0.00877952, 0.00305562,
       0.00138638, 0.00320064, 0.00415751, 0.00452067]), 9]
>>> 

Some features are nulled out altogether, i.e. have no value. Let's choose only those that are greater than 0.004 and train on them:

>>> per = results[24][2]
>>> np.array(per) > 0.004
array([False, False, False,  True,  True, False, False, False,  True,
       False, False, False,  True,  True])
>>> 

MA_PERIODS = [40, 50, 90, 130, 140] - отобранные фичи

Training and testing:

Mnde.

Conclusion: higher information correlation on the training sample does not help to improve the model on the test sample.

But this can be used to squeeze out fractions of % in competitions, which is what professionals say, that pre-selection of features for modern models like Busting gives almost nothing.

 
Maxim Dmitrievsky #:

It turns out to be a pretty good test of the model through window bias. If it produces a lot of lags, instead of zero, with better stats (like mi), then the model is built on randomness

otherwise it would be unambiguous, there can't be so many profitable TCs at once.

I tried to select through std according to Sanych's instructions - approximately the same picture. But I selected sets of signs, not each one separately.

I will look at the traits separately again

std of numerical estimation of the correlation between a trait and a target trait?

First you have to remove the correlated ones. For some reason, the optimum correlation on my traits is 75%.

Then select 5-10 traits with maximum score.

Draw pictures, as above in my post, to make sure the correlation divides the grades.

The prediction error should be less than 30%. If it is not, then the traits will have to be discarded.

 
СанСаныч Фоменко #:

std a numerical estimate of the relationship of a trait to a target?

First you have to remove the correlated ones. On my traits, for some reason, the optimum correlation is 75%.

Then select 5-10 attributes with the maximum score.

Draw pictures, as above in my post, to make sure that the correlation is divided by classes.

The prediction error should be less than 30%. If it is not, then the features will have to be discarded.

You can see from the entropy there that the correlation is negligible (0.003), but it should tend to 1.

but I estimated the difference, so it doesn't matter. There should still be a small improvement. Maybe there isn't because the difference is minimal.

In general, even if everything is good (there are good ones in the set), you don't have to remove the other features

It's so, a billet for other things like a non-standard window
 
Maxim Dmitrievsky #:

it is clear from the entropy there that the relationship is negligible (0.003), but should tend to 1.

but I estimated the difference, so it doesn't matter. There should still be a small improvement. Maybe there isn't because the difference is minimal.

in general, even if everything is good (there are good ones in the set), the rest of the chips don't need to be removed

The score itself is a relative thing.

I'll repeat the pictures.

Bad, hopeless%


Better, if there are several of them, we can talk about 30% prediction error.



And rubbish should be removed, because on the training set the chip can lie in favour of rubbish, it is easier to find the value that leads to the optimum.

 
Maxim Dmitrievsky #:

pre-selection of features for modern Busting-type models yields almost nothing.

Busting looks for the best splits from all columns and all examples. I.e., it uses the best chips.
Schuch. forest takes half of the chips and half of the examples (the fraction is configurable) for each tree and then from 20-100 trees it finds the average. If there are only 5 informative chips out of 200 chips, then some of the trees will not contain informative chips (on average 2.5 informative chips per tree). And we will average a part of informative trees with noise trees. The result will also be very noisy.
A noise tree will work well if there are a lot of informative fiches (as in classical examples/ MO tasks).

Busting will find and use the most informative fiches, since it checks them all. So by the logic of bousting, it will select the best fiches by itself. But bousting has its own problems too.

 
elibrarius #:

Busting will find and use the most informative chips, because it checks them all. So according to the logic of bousting, it will select the best chips. But bousting has its own problems too.

I created a topic with a sample that proves the opposite - boosting is not omnipotent, especially out of the box.

 
elibrarius #:

Busting searches for the best splits from all columns and all examples. I.e. it uses the best fiches.
Schuch. forest takes half of the fiches and half of the examples (the share is configurable) for each tree and then from 20-100 trees it finds the average. If there are only 5 informative chips out of 200 chips, then some of the trees will not include informative chips (on average 2.5 informative chips per tree). And we will average a part of informative trees with noise trees. The result will also be very noisy.
A sporadic forest works well if there are a lot of informative chips (as in classical examples/ MO problems).

Busting will find and use the most informative fiches, since it checks them all. So by the logic of bousting, it will select the best fiches by itself. But bousting has its own problems too.

I can't agree with you on bousting.

Busting will find features that have a strong correlation (predictive power) - believe it. Everything is fine if the magnitude of the correlation is constant. Giving up the estimation of the trait itself, in bousting we cannot track the variability of the magnitude of the association, and according to my data SD of the association estimation can vary from 10% to 120 (on my traits). What will bousting give us? After all, we need to sample the traits that have more variability.

Reason: