Machine learning in trading: theory, models, practice and algo-trading - page 3606

 
Aleksey Nikolayev #:
We have a great calendar in MT5, don't we?

I don't know, I haven't used it. And for what period is the data there?

I think it is important to collect information not only about facts, but also about expectations - you need to collect text from news, traders' reactions to them in forums or wherever they communicate, as well as the opinion of bank analysts.

It is possible that traders are actually trading ahead of the news, and in fact they are already selling currency to those who need it - for example, to importers.

 
Maxim Dmitrievsky #:
On traine and validation, of course. The test has no labels. What data are you passing to the function. Some beginner's questions.

I am asking because I am surprised that everything works so well for you, judging by pictures and words. So I think either I am doing something wrong, or there is a mistake on my or your part....

Maxim Dmitrievsky #:
The benchmark is the distribution of labels in clusters. Ideally, each cluster should have 0 or 1. Then the data is completely predictable. Approaching this benchmark is called correction.

I will keep your terminology in mind. Of course, the model itself becomes simpler in theory, as there are less contradictions, but whether it is better or not on new data, outside of training, I can't understand yet, as I don't observe enough stable clusters - I don't know why it is so.

Maxim Dmitrievsky #:
It follows automatically that the larger the sample, the statistical significance of patterns increases.

Do I remember correctly that you markup on each bar? I just don't do that. I thought about it, but I am confused by the fact that bars standing next to each other get the same marks and at the same time have sometimes inclusions of opposite marks. So it turns out that the market is flat, 30 bars are marked in one point, but the market is fast and 2-3 bars are marked, while the value of these observations is the same, but in the sample the first ones will have a preponderance. Do you have any ideas how to deal with this correctly?

 
Aleksey Vyazmikin #:
What period is the data for?

Perhaps the CalendarValueHistory() function will help in answering this question.

Aleksey Vyazmikin #:
I also think it is important to collect information not only about facts, but also about expectations - you should collect text from news, traders' reactions to them in forums or wherever they communicate, as well as the opinion of bank analysts.

Yeah, that wouldn't hurt either

Aleksey Vyazmikin #:
It is possible that traders are actually trading ahead of the news, and in fact they are already selling currency to those who need it - for example, to importers.
It is also possible that this is the case. Like any hypothesis, it is necessary to test it.
 
Aleksey Vyazmikin #:

I'm asking because I'm surprised that everything works so well for you, judging by the pictures and words. So I think, either I am doing something wrong, or there is a mistake on my or your side....

I'll keep your terminology in mind. Of course, the model itself becomes simpler in theory, as there are less contradictions, but whether it is better or not on new data, outside of training, I can't understand yet, as I don't observe enough stable clusters - I don't know why it is so.

Am I remembering correctly that you have markup on every bar? I just don't do that. I thought about it, but it confuses me that bars next to each other get the same markings and at the same time have sometimes inclusions of opposite markings. So it turns out that the market is flat, 30 bars are marked in one point, but the market is fast and 2-3 bars are marked, while the value of these observations is the same, but in the sample the first ones will have a preponderance. Do you have any ideas how to deal with this correctly?

From the point of view of skilful curvefitting, theoretical constructions make sense only in terms of MO and matstat. The rest is stupidly tested through experimentation. There are no obvious contradictions in my theoretical arguments. Experiments also show pretty good curvatures. That is, they show curvafitting improvement if I apply my some theoretical conclusions. I don't put any other hidden meaning there.

Also you won't find such information anywhere else, because these are my intellectual higher... products.)

The markup on each bar gives more data that can be filtered and/or corrected. If you further divide BP into states and fix the labels for each state, there will be less conflicts in the markup. The proposed method is just supposed to automatically correct markups made at random. When highly correlated observations should have the same labels. It brings order to it.

 

Let's train the model auto-learning with Open AI API ?


 
Maxim Dmitrievsky #:
Experiments also show good curves.

Can you provide statistics on what percentage of, let's say, 1000 attempts will create a strategy with a positive outcome, taking into account a reasonable spread?

Ideally, of course, statistics on clusters of these 1000 attempts would be better.

How many examples in one cluster with a biased probability as a percentage of its class?

Maxim Dmitrievsky #:
You won't find such information anywhere else, because these are my intellectual su... products :)

Thanks for sharing your achievements!

Maxim Dmitrievsky #:
Markup on each bar gives more data that can be filtered and/or corrected.

I think that to calculate statistics we should take into account these clusters and classify them as one example, because if a signal passes repeated on such a cluster, it will be considered that there are 10 signals when building the model, not 1. This is from an economic point of view I think, because instead of 1 lot you will not open 10 on each bar... or will you? or will you?

 
Yuriy Vasilyev #:

Let's train the model auto-learning with Open AI API ?

Tell us what it is and what result you get.

 
Aleksey Vyazmikin #:

Can you provide statistics on what percentage of, let's say 1000 attempts, will create a strategy with a positive outcome, given a reasonable spread?

Ideally, of course, it would be better to have statistics on clusters of these 1000 attempts.

How many examples in one cluster with a biased probability as a percentage of its class?

Thanks for sharing your progress!

I think that to calculate statistics we should take into account these clusters and classify them as one example, because if a signal passes repeated on such a cluster, it will be considered that there are 10 signals when building the model, not 1. This is from an economic point of view I think, because you will not open 10 lots on each bar instead of 1... or will you? or will you?

I have not made such measurements, because it takes a lot of time to have reliable statistics. Some algorithms stand on reals, plus.

Good curves are obtained after almost every training, depending on the settings of the algorithm. Training is very fast, it was a priority.

Periodically I come up with something new. If the new one turns out to be better than the old one, it means that I am moving in the right direction.

I like that these TSs are wooden (I mean oak) and trade often on hours. They are hard to break by spread or news. There is almost no need to follow them.
 
Maxim Dmitrievsky #:
I haven't done such measurements because you have to spend a lot of time on them to have reliable statistics. Some algorithms stand on reals, pluses.

Good curves are obtained after almost every training, depending on the settings of the algorithm. Training is very fast, it was a priority.

Periodically I come up with something new. If the new one turns out to be better than the old one, it means that I am moving in the right direction.

I like the fact that these TS are wooden (in the sense of oak) and trade often on hours.They are hard to break by spread or news. There is almost no need to follow them.

Here are the statistics on 35000 hourly bars.

> summary(abs(Frame_Base$Close.1_d1))
     Min.   1 st Qu.    Median      Mean   3 rd Qu.      Max. 
0.0000000 0.0001900 0.0004500 0.0006861 0.0008900 0.0173100 

Half of them have an increment of less than 4 pips and 75% have less than 9 pips in four digits.

If we take into account that the stop should be less than the profit, i.e. we need marks with the increment over at least 15 pips, then it is impossible to build anything on the hour-markers.


Something weakly draws out on H2

> summary(abs(Frame_Base$Close.1_d2))
     Min.       1 st Qu.         Median      Mean        3 rd Qu.      Max. 
0.0000000       0.0002800       0.0006300   0.0009759   0.0012700    0.0214600 

And something can be discussed in terms of spread on H3

> summary(abs(Frame_Base$Close.1_d3))
    Min.  1 st Qu.   Median     Mean  3 rd Qu.     Max. 
0.000000 0.000340 0.000790 0.001206 0.001590 0.023190
 
СанСаныч Фоменко #:
How do I trade on the minute? Turns out it's impossible, and I didn't know it.