Is there a pattern to the chaos? Let's try to find it! Machine learning on the example of a specific sample. - page 17

 
Valeriy Yastremskiy #:

Other than the start time of something and the end time (sessions, calendar), nothing comes to mind. What do you mean?

Why, it's just the most obvious possible ones. Volatility was in the figure, there are other factors to consider

Roughly, you have to take some market metric and suck it up, I guess. Not in a pile. Everybody wanted to pile up, but it turns out it doesn't work that way.
 
Maxim Dmitrievsky #:
Why, it's just the most obviously possible. Volatility was in the figure, there are other factors to consider

Roughly, we should take some market metric and suck it up, I guess. Not in a pile. Everybody wanted to pile, but it turns out it doesn't work that way.
With discrete price metrics it's not easy to do everything, separately it loses connection and meaning often. Channel width or volatility and speed of price change. It seems simple, but there is no concept)))) in my head).
 
Valeriy Yastremskiy #:
With discrete price metrics, not everything is simple, in isolation one loses connection and meaning often. Channel width or volatility and speed of price change. It seems simple, but there is no concept)))) in my head))

We must first define what we want from the MO. Just classification or some selection, i.e. to make a tool like a standard optimiser, where different target functions are set.

then use it to manipulate indicators (attributes). Often in a dumb way, by brute force, until something interesting comes to mind (like the Grail).

Once we've learnt something, we start digging deeper.

I guess I think I'm writing obvious things, but they turn out to be so only for me :D

 
Renat Akhtyamov #:

The branch question is certainly an interesting one....

That's why I was wondering.

Perhaps a pattern can be identified.

I suggest analysing several bars in a row, for example 3-4.

Then move one bar from the beginning of this sample of 3-4 bars and analyse again.

As if overlaying one sample on another.

It is possible to find a pattern

Like this:


What's that gonna do? The bars will always be different. Unless you tie it to the current scale. I think a slightly different theory should be the approach, for example colour (polarity). Stationarity it occurs, but in a slightly different way and is also easily destroyed.
 

In the meantime, I got this model from the first sample in this thread.

Balance

Unfortunately, it fails on the test sample - well, obviously the samples are different.

 

In the process of finding a way to train a model, different approaches were tested, more than 100 thousand models were created, and new approaches were invented that had previously shown good potential. Different methods of sample transformation and predictor selection were used. These methods were combined with each other.



Methods of sample transformation and predictor selection.

Sample transformation:

  • 1. No transformation.
  • 2. Selection of transactions by type of direction (vector) - without selection, buy, sell - with saving into separate samples.
  • 3. Shifting the target function depending on the financial result.
  • 4. "Drop" method - exclusion of rows from the sample by strong quantum of FP type predictor - 10-30 consecutive iterations of estimation of sample quantums with changing the sample at each iteration:

(a) Exclusion by the best quantum without taking into account the vector on the sample train if such is greater than a given percentage of deviation from the mean of the target in the sample, otherwise it is evaluated additionally on the samples taking into account the vector and the best variant is selected.

b) Same as "a", but the quantum segments that showed on the test sample deviation in the direction of decreasing probability for TN are not taken.

c) Same as "a", but the estimation selection is performed on the "test" sample.

Predictor selection:

  • 1. quantisation method :

(a) Predictor selection by statistics with selection of quant tables for each predictor.

( b) Selection of quants by statistics with binary sampling.

(c) Combining quantile segments from subsamples with different vectors to form a pooled binary sample.

d) Selection of predictors by statistics with selection of quantum tables based on binary quanta.

  • 2)Exclusion of predict orswith strong correlation.
  • 3. grouping of predictors by similar response field with selection of dominant predictor per field - only after conversion of predictors into binary ones.
  • 4. Selection by average frequency of predictor use in CatBoost models on split training sample into 8 parts. We use 5 learning strategies and more than 20 methods of predictor selection based on the obtained statistics.
  • 5. Summary selection of predictors for each sample after performing the "Drop" method.
 

A new record, and this time the test sample is also in the plus side.

Balance

Model

 

I will write how the model was derived - probably for myself, since no one is interested in how to get something out of a complex sample.

So, the first thing I did here was to shift the target according to the principle that if the profit is less than 50 pips, it is a negative outcome, i.e. "0" instead of "1" before. There are less than 20% positive marks left, but it allowed to select more pronounced trend movements.

Then I selected splits from the quant tables for each predictor. We used about 900 tables for sampling, selected splits shifting the probability from 5%, and assessed the stability of signal generation in the quantum split.

The next step is to combine the selected quantum segments. I used the approach with random element and evaluated the result by the criterion - "the more segments, the better" - I am not sure that the method is perfect, and perhaps it should be improved - I need to think about the algorithm.

In this way I got a combined quantum table for predictors. Predictors without successful segments got just a separator "0,5" in the quantum table.

I trained 100 models with Seed from 8 to 800 in steps of 8.

From the obtained variants I selected the best model, analysed the predictors that were used by it - it turned out that there were 77 of them.

I tried to train another 100 models, but only on these predictors, also with Seed from 8 to 800 with step 8. The result of the best models was slightly worse than the last model. And it puzzled me of course.

I decided that I should try Seed with a smaller step and in a larger volume, because their predictors can get better results - it is proven. I trained 10000 models, with Seed from 1 to 10000 with step 1.

The graph below shows the financial result of the models, ordered from best to worst.

About 25% of the models were unprofitable, which is not so bad anymore, and the average profit is 2116.65. 38% of models have profit more or equal to 3000 points.

It is not entirely clear why the results on the test sample do not correlate with the exam samples - is it a peculiarity of the subsample, or could there be other reasons?

The graph below shows the results for the test sample - ordered in the same way as before - by the exam sample financial result.

And for clarity, the scatter plot - looks like random.

I thought it was a matter of indicators - points, not binary statistical indicators, but as you can see in the graph below, the accuracy indicator between the two samples is also independent.


Without identifying the dependence of the results on the exam sample on the test and train samples it is difficult to select a model - I think we need to develop additional evaluation criteria - metrics the model can.

The new model (I noticed two) used less than 50 predictors. I think to repeat the process - eventually there will be enough predictors left to build a model.

What to do then - we can train the model on the full sample using only the selected predictors and then see how their aggregate will behave on new data.

Besides, I would like to try to find some special features in the selected predictors that increase the probability of their selection without training - by analogy, as it was done with selection of quantum segments from tables.

 
Aleksey Vyazmikin #:

Then I selected splits from quantum tables for each predictor. We used about 900 tables for sampling, selected splits shifting the probability from 5%, and evaluated the stability of signal generation in the quantum split.

The next step is to combine the selected quantum segments. I used the approach with random element and evaluated the result by the criterion - "the more segments, the better" - I am not sure that the method is perfect, and perhaps it should be improved - we need to think about the algorithm.

It is essentially selecting leaves with >55% probability ?

Aleksey Vyazmikin #:

I tried to train 100 more models, but only on these predictors, also with Seed from 8 to 800 with step 8. The result of the best models was slightly worse than the last model. And this puzzled me, of course.

Apparently Seed-a's random did not coincide completely with the variant of the best sample's random. Hence the different/worse results.

Aleksey Vyazmikin #:

It is not clear why the results on the test sample do not correlate with the exam samples - is it a peculiarity of the subsample, or can there be other reasons?

The graph below shows the results for the test sample - ordered in the same way as before - by the exam sample financial result.

It's like with normal training that is re-trained/adjusted to the traine. In this case, you have made a fit to the exam. Any fitting, both for test and exam, as we can see on your test leads to a random result.

I don't think you should take the best trains or exams. You need something stable, though with a much worse result than the best traine or exam-e.

When I was working with Darch, there was a selection on both samples err = err_ oob * k+ err_trn * (1 - k); where k=0.62 (recommended, but can be changed)
I.e. err = err_ oob * 0.62+ err_trn * 0.38;
But this is an unnecessary parameter for selection with increasing calculation time.

According to my experiments with sampling on H1 - there was something stable, but little earning. 10000 trades, but only 0.00005 per trade. This is also uninteresting, as spreads/slippages, etc. will eat up these 5 pts in regular trading.

You have 400 trades but 40 pts on the exam. And on the test, like me - close to 0 (random).

There are a lot of approaches, but nobody has found one that works.
 
elibrarius #:

It's basically a leaf selection with >55% probability ?

No, it is, shall we say, the numerical range of one predictor that is selected. The 5% is relative to the value of the percentage "1" in the sample train.

elibrarius #:

Apparently Seed-a's random did not match completely with the variant random of the best sample. Hence the different/worse results.

The random is fixed :) It seems that this seed is calculated in a tricky way, i.e. all predictors allowed for model building are probably involved, and changing their number also changes the selection result.

elibrarius #:

It's like with normal learning that is retrained/fitted to the traine. In this case, you have made a fit to the exam. Any fitting, both for test and exam, as we see on your test leads to a random result.

Why is it a fit, or rather what do you see as a fit? I tend to think that the test sample differs from exam more than exam from train, i.e. there are different probability distributions of predictors. And it can be treated either by selecting those predictors that are the most stable - gives acceptable results on all samples, or by changing the probability distribution by an external feature (i.e. another predictor) - I don't know about such models, but I would like to try. A similar effect could be obtained by using recurrent training on selected leaves of different trees or even whole models. Perhaps recurrent neurons can do this - I don't know about them.

I treat this method so far as a way to select predictors on which to build the combined model, and as benchmarks for identifying other effective ones before the actual training.

elibrarius #:

I don't think you should take the best trains or exams. You need something stable, albeit with a much worse result than the best traine or exam-e.

When I was working with Darch, there was a selection on both samples err = err_ oob * k+ err_trn * (1 - k); where k=0.62 (recommended, but you can change it)
I.e. err = err_ oob * 0.62+ err_trn * 0.38;
But it is an unnecessary parameter for selection with increasing calculation time.

What is the metric that err_ is?

elibrarius #:

You have 400 trades, but at 40 pts on the exam. Well, on the test, like me - close to 0 (random).

There are a lot of approaches, but no one has found a productive one yet.

The X axis is the value of expectation matrix on the test sample, i.e. in general, yes, but there are some successful instances.


Reason: