Machine learning in trading: theory, models, practice and algo-trading - page 2065

 
Evgeniy Chumakov:

Assuming that there are no gaps in the history and all days of 1440 minutes (less on Friday), the code should be this:

There are gaps even in the eurusd. Work with the bar time
 
elibrarius:
Gaps occur even on eurusd. Work with bar time


That's understandable, if I did, that's what I would do. It's just to make it easier to explain.

 

Somehow, it seems to work) To save, a new row is collected in rates[i].high

#property script_show_inputs
//+------------------------------------------------------------------+
input datetime tstart = D'2020.5.1 00:00';  // начало промежутка исследуемого времени
input datetime tstop = D'2020.10.1 00:00';   // конец промежутка исследуемого времени
string zztn = "dvol\\" + _Symbol + ".txt"; // text file name
//+------------------------------------------------------------------+
#define  NM 1440
ulong NSM = 60;
ulong NSD = 60 * 1440;
//+------------------------------------------------------------------+
void OnStart()
{
  MqlRates rates[];
  double vol[NM] = {0.0}, d;
  int n[NM] = {0}, t;
  int nprice = CopyRates(Symbol(), PERIOD_M1, tstart, tstop, rates);
  for(int i = 0; i < nprice; ++i)
  {
    t = (int)((((ulong)rates[i].time) % NSD) / NSM);
    d = rates[i].close - rates[i].open;
    ++n[t];
    vol[t] += d * d;
  }
  for(int i = 0; i < NM; ++i)
  {
    if(n[i] > 1) vol[i] /= n[i];
    vol[i] = sqrt(vol[i]);
  }
  for(int i = 0; i < nprice; ++i)
  {
    t = (int)((((ulong)rates[i].time) % NSD) / NSM);
    if(vol[t] > 0) rates[i].high = (rates[i].close - rates[i].open) / vol[t];
    if(i > 0) rates[i].high += rates[i - 1].high;
  }
  int ft = FileOpen(zztn, FILE_WRITE | FILE_COMMON | FILE_ANSI | FILE_TXT);
  FileWriteString(ft, "t p");
  for(int i = 0; i < nprice; ++i)
    FileWriteString(ft, "\n" + (string)((ulong)rates[i].time) + " "  + (string)rates[i].high);
  FileClose(ft);
}
 
Aleksey Nikolayev:

Somehow, it seems to work) To save new row is collected in rates[i].high

You normalize the past bars, for example from 2020.5.1 00:00 by the bars from the future, c 2020.10.1 00:00 and those in between.
You cannot do that in the real world.
You have to make the same calculation for every bar, but only for the past bars.

 
Aleksey Nikolayev:

Somehow, it seems to work) To save the new series is collected in rates[i].high

Even if you do it right, you can very accurately reconstruct these normalized candlestick heights with net/forest.
As a fetch we feed 60 candlesticks' heights with day offset, and teach these normalized heights obtained in your code.
The training should be accurate close to 100%.

I.e. normalized candlesticks heights do not bring any new information, if they are needed, the model will reproduce them inside itself and take them into account.
The only benefit is that there is no need to pass 60 extra features to model training.
It is unlikely that someone will submit as fiches bars 2 months old, ie new information for those who have not filed them, there is still).

If these normalized candlesticks heights improve the model efficiency, then of course it will be necessary to use it (or one of your features, which is more preferable, or the 60-th of which it consists of).

 
Let's already check something, and then three pages spelled out, wasted energy, and an increase in the error God forbid it to be half a percent
 
elibrarius:

Weird. I wonder how this can be explained?
I have another version commented out, but I didn't like it for logical reasons:

Which RandomInteger() do you use? I'm XOR.

I don't know how to explain :)

I took this function

int RandomInteger(int max_vl)
{
   return (int)MathFloor((MathRand()+MathRand()*32767.0)/1073741824.0*max_vl);  //случайное Int от 0 до  1073741824
}
 

Maxim, I have a suspicion that the model for C++ is not correctly unloaded from CatBoost - can you compare it with the model for python?

The indicators of model interpretation in MQL5, where the values are taken from the CPP model, and the values from the binary model do not coincide. The delta is around 0.15 - which is a lot.

 
elibrarius:

You normalize the past bars, for example from 2020.5.1 00:00 by the bars from the future, from 2020.10.1 00:00 and those in between.
You do not do that in the real world.
You have to make approximately the same calculation from each bar, but only from the past bars for it.

Yes, there is a looking into the future, as well as other problems that are not immediately obvious. It is not quite applicable for direct trading, but it is indispensable for preliminary analysis. For example the correlation between non-relative increments makes no sense.

 
Aleksey Nikolayev:

Yes, there is a glimpse into the future, as well as other problems that are not immediately obvious. Directly, for direct trade it is not quite applicable, but for the preliminary analysis it is an indispensable thing. For example, it makes no sense to calculate the correlation between non-relative increments.

I'll take it into account and maybe someday I will re-do it for previous bars and check if it improves trainability.
If anyone has checked it before, please let me know.

What is the preliminary analysis? You feed the model to the input and compare it with or without this feature.

It seems to me better to normalize the last 30 minutes.
As an alternative, the last 30 minutes of this day and the 5 previous days of 30 minutes.

According to the volatility change in March, your variant will take a long time to adapt, the current values will be much higher than they were a month or two ago. As a result, the model will work in an unknown zone. It simply will not have examples of such work on the basis of which to make forecasts.

With normalization over the last week, it will learn the new rules of the game faster.
Reason: