Machine learning in trading: theory, models, practice and algo-trading - page 3187

 
fxsaber #:.

ZЫ In general, if there is interest to try to find differences between the two rows, can provide them.

Have a look at what I wrote to you. Will only be able to look at it myself in the autumn.

 
Aleksey Nikolayev #

Forester#:

I did an experiment with the sample on which I published the gifs, there are already 47% units in the sample, the data is summarised in a table.


Description of the content of the columns:

  • Generation - the number of random generation of the target with a fixed number of "1" and "0", the last line - the original target
  • % Similarity of all - the percentage of similarity of the target is specified.
  • % Similarities "1" - the percentage of similarity of the target is specified, but only for response "1".
  • % Similarity "0" - the percentage of similarity of the target is given, but only for a response of "0".
  • Q_All - how many total quantum segments were found using 870 quantum tables and 6533 predicates
  • Q_All% - how many "Q_All" as a percentage of the sample with the original target
  • Q sampled - shows how many quantum segments were sampled (only those that do not overlap in range are sampled)
  • Q selected% - how many "Q selected" in percentage expression from the sample with original target.
  • Predictors - for how many predictors from the sample it was possible to find a quantum segment that fulfils the given criteria
  • Predictors % - how many "Predictors" in percentage expression from the sample with original target

Let me explain that for one predictor more than one quantum segment can be selected in total, and these segments should not overlap in the range of the predictor value.

What I don't like is that in the neighbourhood of 50% of the targets are left in place, which can negatively affect the evaluation of the result.

In fact, it turns out that quite a lot of quantum segments were found on random targets, but because they were some clusters (presumably), different tables overlapped their coordinates, so after selecting non-overlapping ranges, it turned out that the quality (utility) of these quantum segments is worse (less) than the original ones by a factor of 10. Accordingly, on average, on the sample with the original target, quantum cuts were found more for different predictors by 3.5 times.

What do you think of the results?

Added:

The binary sequence plot of the random target and the original looks like this


 
Aleksey Vyazmikin #:

I conducted an experiment with the sample on which I published the gifs, there are already 47% units in the sample, the data is summarised in the table.


Description of the content of the columns:

  • Generation - the number of random generation of the target with a fixed number of "1" and "0", the last line - the original target
  • % Similarity of all - the percentage of similarity of the target is specified.
  • % Similarities "1" - the percentage of similarity of the target is specified, but only for response "1".
  • % Similarity "0" - the percentage of similarity of the target is given, but only for a response of "0".
  • Q_All - how many total quantum segments were found using 870 quantum tables and 6533 predicates
  • Q_All% - how many "Q_All" as a percentage of the sample with the original target
  • Q sampled - shows how many quantum segments were sampled (only those that do not overlap in range are sampled)
  • Q selected% - how many "Q selected" in percentage expression from the sample with original target.
  • Predictors - for how many predictors from the sample it was possible to find a quantum segment that fulfils the given criteria
  • Predictors % - how many "Predictors" in percentage expression from the sample with original target

Let me explain that for one predictor more than one quantum segment can be selected in total, and these segments should not overlap in the range of the predictor value.

What I don't like is that in the neighbourhood of 50% of the targets are left in place, which can negatively affect the evaluation of the result.

In fact, it turns out that quite a lot of quantum segments were found on random targets, but because they were some clusters (presumably), different tables overlapped their coordinates, so after selecting non-overlapping ranges, it turned out that the quality (utility) of these quantum segments is worse (less) than the original ones by a factor of 10. Accordingly, on average, on the sample with the original target, quantum cuts were found more for different predictors by 3.5 times.

What do you think about the results?

Question for Alexei. I am not strong in statistical theory. I just suggested mixing the target instead of generation.
 
Forester #:
Question for Alexei. I am not good at statistical theory. I just suggested mixing the target instead of generation.

I see.

I have another suggestion to you, what if we make more manageable the process of forest construction, and take a specific subsample of the selected quantum segment as a root for each tree?

Make the depth around 2-3 splits, so that the examples of classifiable class by leaf would be at least 1%.

I think the model will be more stable.

 
Aleksey Vyazmikin #:

I conducted an experiment with the sample on which I published the gifs, there are already 47% units in the sample, the data is summarised in the table.


Description of the content of the columns:

  • Generation - the number of random generation of the target with a fixed number of "1" and "0", the last line - the original target
  • % Similarity of all - the percentage of similarity of the target is specified.
  • % Similarities "1" - the percentage of similarity of the target is specified, but only for response "1".
  • % Similarity "0" - the percentage of similarity of the target is given, but only for a response of "0".
  • Q_All - how many total quantum segments were found using 870 quantum tables and 6533 predicates
  • Q_All% - how many "Q_All" as a percentage of the sample with the original target
  • Q sampled - shows how many quantum segments were sampled (only those that do not overlap in range are sampled)
  • Q selected% - how many "Q selected" in percentage expression from the sample with original target.
  • Predictors - for how many predictors from the sample it was possible to find a quantum segment that fulfils the given criteria
  • Predictors % - how many "Predictors" in percentage expression from the sample with original target

Let me explain that for one predictor more than one quantum segment can be selected in total, and these segments should not overlap in the range of the predictor value.

What I don't like is that in the neighbourhood of 50% of the targets are left in place, which can negatively affect the evaluation of the result.

In fact, it turns out that quite a lot of quantum segments were found on random targets, but because they were some clusters (presumably), different tables overlapped their coordinates, so after selecting non-overlapping ranges, it turned out that the quality (utility) of these quantum segments is worse (less) than the original ones by a factor of 10. Accordingly, on average, on the sample with the original target, quantum cuts were found more for different predictors by 3.5 times.

What do you think of the results?

Added:

The binary sequence plot of the random target and the original looks like this


Ten simulations is nothing, you need thousands for statistical significance.

Also not ready to give an expert opinion on a particular case, but just pointed out possible problems and common ways to solve them.

 
Aleksey Vyazmikin #:

What do you think of the results?

Added:

The binary sequence graph of the target random and the original looks like this

You're making some pointless and relentless nonsense. Saber at least had it happen in half an hour and forgot about it.
 
Aleksey Nikolayev #:

Ten simulations is nothing, you need thousands for statistical significance.

I am also not ready to give an expert opinion on a particular case, but only pointed out possible problems and common ways of solving them.

Thousands - it takes too much computational resources - one pass - about 40 minutes - basic calculation on a video card.

I generally thought that this test only allows you to check the possibility of such clusters on different ranges of the predictor.

And it is necessary to look at the probability of hitting a particular range of the quantum segment, which has already been initially selected.

And still I would like to hear the opinion on the question of difference of the target in percentage expression for reliability of such test.

 
Maxim Dmitrievsky #:
You're making some pointless and relentless nonsense. Saber at least had it happen in half an hour and forgot about it.

Keep your assessments of other people's performance to yourself, especially when you don't understand what the other person is doing.

I am open to constructive criticism, and there is none coming from you.

 
Aleksey Vyazmikin #:

Keep evaluations of other people's performance to yourself, especially when you don't understand what the other person is doing.

I'm open to constructive criticism, and you're not.

You're doing bullshit. It has been written several times that you will get ANY results at random. Open your eyes to see. Nothing to add :)

Can you at least understand what you are doing and why?)
 
Maxim Dmitrievsky #:
You're making bullshit. It has been written several times that you will get ANY results at random. Open your eyes to see. Nothing to add :)

If you think the market is random, then why are you wasting your time - any model will not work, except by chance.

Reason: