Machine learning in trading: theory, models, practice and algo-trading - page 2129

 
elibrarius:
Are you robots?
Working day and night without sleep or rest ))))

Time zones... but still impressive))))

 
elibrarius:

7 marks are enough.

Here's a histogram of the balance - new models in blue, old models in red.

All settings are the same.

Predictor relevance.



The models use time to the max. It is hard to say if this is good or bad, but it seems to be bad when some predictor gets an advantage at the expense of similarity to analogues.

Recall

Precision

In the end there is a difference, of course not much, but still.

The balance is not bad.


 
Aleksey Vyazmikin:

Here's a histogram of the balance - new models in blue, old models in red.

All settings are the same.

Predictor relevance.



The models use time to the max. It's hard to say good or bad, but it's kind of bad when some predictor gets an advantage at the expense of similarity to analogues.

Recall

Precision

In the end there is a difference, of course not much, but still.

The balance is not bad.


So is the sine + cosine of time better than just numbers?
Do you feed the minutes in the old version? If not, then do, for a proper comparison. The sine + cosine version takes them into account. Or remove minutes from syn+kos if it is faster.
 
elibrarius:
So is sine + cosine time better than just numbers?
Do you feed the minutes in the old version? If not, then do, for a proper comparison. The sine + cosine version takes them into account. Or take the minutes out of syn+kos if it's faster.

The metric I provided looks worse. The reason is that it is more likely to hit a random set of predictors to build a split tree predictor associated with time.

Yes, I didn't use minutes in my old version.

 
Aleksey Vyazmikin:

The metric I provided looks worse. The reason is the high probability of hitting a random set of predictors to build a split tree of the predictor associated with time.


Did you do a catbustom? There shouldn't be any random selection of predictors there. Boosts use all predictors, but shallow trees.

In a random forest, yes. Set , for example, by.

max_features{"auto", "sqrt", "log2"}, int or float, default="auto "

The number of features to consider when looking for the best split:


Aleksey Vyazmikin:

Yes, I didn't use minutes in my old version.

Maybe they worsened the result? Try in the new version to remove them.

That should be a full analogy to your old version

if(nameInd[nInd]=="Hour")        {CopyTime        (sim,per,startDt,n_bar+1,dtm);TimeToStruct(dtm[0],dts);ArrayResize(tmp,1);tmp[0]=(double)(dts.hour)*360.0/24.0;tmp[0]=(buf==0?MathSin(tmp[0]*pi/180.0):MathCos(tmp[0]*pi/180.0));}

if(nameInd[nInd]=="WeekDay")     {CopyTime        (sim,per,startDt,n_bar+1,dtm);TimeToStruct(dtm[0],dts);ArrayResize(tmp,1);tmp[0]=(double)(dts.day_of_week)*360.0/7.0;tmp[0]=(buf==0?MathSin(tmp[0]*pi/180.0):MathCos(tmp[0]*pi/180.0));}


 
elibrarius:

Did you do catbust? There should not be a random selection of predictors. Boosts use all predictors, but shallow trees.

In a random forest, yes. Set , for example, by.


Maybe they worsened the result? Try removing them in the new version.

That's it, should be a full analogy to your old version

if(nameInd[nInd]=="Hour")        {CopyTime        (sim,per,startDt,n_bar+1,dtm);TimeToStruct(dtm[0],dts);ArrayResize(tmp,1);tmp[0]=(double)(dts.hour)*360.0/24.0;tmp[0]=(buf==0?MathSin(tmp[0]*pi/180.0):MathCos(tmp[0]*pi/180.0));}

if(nameInd[nInd]=="WeekDay")     {CopyTime        (sim,per,startDt,n_bar+1,dtm);TimeToStruct(dtm[0],dts);ArrayResize(tmp,1);tmp[0]=(double)(dts.day_of_week)*360.0/7.0;tmp[0]=(buf==0?MathSin(tmp[0]*pi/180.0):MathCos(tmp[0]*pi/180.0));}


About random - there is enough of it.

Weren't you surprised, that I have separate times with sine and cosine, and as I understand it now, there must be one, but sine and cosine are used?

Hence the question, what is buf, and why when it is equal to zero, then we take a cosine?

 
Aleksey Vyazmikin:

As for the randomness - there is enough of it.

You are not surprised that I have a separate time and with a sine and with a cosine, and as I now understand, should be one, but used then a sine, then a cosine?

Hence the question, what is buf, and why when it is zero, then take the cosine?

The buf is the number of the buffer.
For the time there are 2.There are indicators with 1 buffer, there are more than 2.

I loop through the number of buffers in my loop when making up columns for the training set.

You have to feed both sine and cosine, not just one. Explanation of why - here https://megaobuchalka.ru/9/5905.html

Numerical data, it would seem, does not need to be coded. But in some cases it is reasonable to encode numerical data as well [22]. When coding numeric data it is necessary to take into account the meaning of data, location of values in the interval of values, and accuracy of data measurement. Let us demonstrate it on examples. For example, coding allows to consider the meaning of data. If a network input is an angle between two directions, e.g. wind direction, it does not matter if the angle is in degrees or radians. Such a feed will cause the network to have to "learn" that 0 degrees and 360 degrees are the same thing. It looks more reasonable to feed the sine and cosine of that angle as input. The number of input network signals increases, but close input values are encoded by close input signals.

Виды трансформации данных — Мегаобучалка
Виды трансформации данных — Мегаобучалка
  • megaobuchalka
  • megaobuchalka.ru
Трансформация данных — это преобразование данных к определенному представлению, формату или виду, оптимальному с точки зрения конкретного метода анализа [6]. Для разных задач анализа могут потребоваться разные методы трансформации. Типичными средствами трансформации данных являются следующие. Преобразование временны́х данных . Оптимизация...
 
elibrarius:

buf is the number of the buffer.
For time there are 2. There are indicators with 1 buffer, there are more than 2.

I loop through the number of buffers in my loop when making up the columns for the training set.

You have to feed both sine and cosine, not just one. Explanation of why - here https://megaobuchalka.ru/9/5905.html

So I did it right originally - I just don't remember what I was doing anymore...

   double tmp[4];
   int nInd=0;
   MqlDateTime dts;
   double pi=3.1415926535897932384626433832795;
   for(int buf=0; buf<2; buf++)
   {
      TimeToStruct(iTime(Symbol(),PERIOD_CURRENT,0),dts);
      tmp[buf]=(double)(dts.hour*60+dts.min)*360.0/1440.0;
      //tmp[buf]=(double)(dts.hour*60+dts.min)*360.0/24.0;
      tmp[buf]=(buf==0?MathSin(tmp[0]*pi/180.0):MathCos(tmp[0]*pi/180.0));

      TimeToStruct(iTime(Symbol(),PERIOD_CURRENT,0),dts);
      tmp[buf+2]=(double)(dts.day_of_week*1440+dts.hour*60+dts.min)*360.0/10080.0;
      //tmp[buf+2]=(double)dts.day_of_week*360.0/7.0;
      tmp[buf+2]=(buf==0?MathSin(tmp[0]*pi/180.0):MathCos(tmp[0]*pi/180.0));
   }
 
Can't we just do one entry instead of 4?
Just the number of minutes from Monday's 0:00 =.
dts.day_of_week*1440+dts.hour*60+dts.min
Bad idea, though. To get to the first 10 minutes of each hour, for example, you would have to do a lot of splits.
Probably better like you - just days, hours. And maybe minutes.
 
elibrarius:
Can't we just do one entry instead of 4 at all?
Just the number of minutes from Monday 0:00 = Although is a bad idea. To get to e.g. the first 10 minutes of each hour, you would have to do a lot of splits.
Probably better like you - just days, hours. And maybe minutes.

Already started training without minutes - let's see.

I also use 1/4 bar time - hours, 4 hours, days.

Reason: