Machine learning in trading: theory, models, practice and algo-trading - page 3420

 
mytarmailS #:

Well, if it's a stupid averaging, the authors didn't bother to add a sell? )))


Well, I know personally on Forex one pattern that only in longing works like clockwork every day.

The same inflation is also a pattern, the stock market always grows thanks to it.

Well, it doesn't work that way on the sell side.)

 
Maxim Dmitrievsky #:

Well, it doesn't work that way already )

which confirms the fact that a pattern does not have to work both ways :)

 
Forester #:
Self-written. But there is a check code there - the results of assignment to clusters from it and from KMeansGenerate on the training matrix coincided completely when I checked.
.


Each Restarts - with different starting points to start with. There is randomisation, but (probably) not repeatable - haven't checked. I think this could be refined if you really need to....

I've sketched the code, everything compiles, but I get an error in the library - maybe I'm feeding the data in the wrong way?

//+------------------------------------------------------------------+
//|                                                 Tree_K-Means.mq5 |
//|                        Copyright 2016, MetaQuotes Software Corp. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property link      "https://www.mql5.com/ru/users/-aleks-"
#property version   "1.00"
#property script_show_inputs
#property strict
#include <Math\Alglib\alglib.mqh>

//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
{
   int arr_Klaster[]; //Массив для получения принадлежности строки к алстеру
   double arr_Data_Load[];//Массив с выборкой для кластеризации
   int Strok_Total_Data=1000;//Число строк
   int Stolb_Total_Data=100;//Число столбцов
   
//---CСоздаем выборку
   ArrayResize(arr_Data_Load,Strok_Total_Data*Stolb_Total_Data);
   ArrayInitialize(arr_Data_Load,0);
   MathSrand(100);
   int Get_Rand=0;
   for(int i=0; i<Strok_Total_Data; i++)
      for(int j=0; j<Stolb_Total_Data; j++)
      {
         Get_Rand=MathRand();
         if(Get_Rand>16000)arr_Data_Load[i*Stolb_Total_Data+j]=1;
      }
      
//---Кластеризуем
   f_Klaster(arr_Data_Load,arr_Klaster,Stolb_Total_Data,Strok_Total_Data,3);
}

//+------------------------------------------------------------------+
//|Функция выполняет кластеризацию
//+------------------------------------------------------------------+
void f_Klaster(double &Input_arr_Data[],int &Output_arr_Klaster[],int N_Stolb,int N_Strok,int N_Klasters)
{
   CMatrixDouble MatrixLearn;//обучающая часть данных

   for(int col=0; col<N_Stolb; col++)//Число столбцов
   {
      for(int row=0; row<N_Strok; row++)//Число строк
      {
         MatrixLearn[row].Set(col,Input_arr_Data[N_Stolb*row+col]);//строка/столбец
      }
   }

   CMatrixDouble MatrixCentroid;//массив центров
   int KM_info=0;
   CAlglib::KMeansGenerate(
      MatrixLearn,//- массив [строк , столбцов]
      N_Strok,// - число строк
      N_Stolb,// - число столбцов
      N_Klasters,//- число кластеров, сколько хотите получить
      10,// - число попыток запуска для лучшей инициализации
      KM_info,//- коды ошибок и результата выполнения
      MatrixCentroid,//- массив центров
      Output_arr_Klaster// - номер кластера для каждой строки из XY
   );

   Print("KM_info=",KM_info);
}
//+------------------------------------------------------------------+
//| k-означает++ кластеризацию -> предсказать номер кластера         |
//| ВХОДНЫЕ ПАРАМЕТРЫ:                                               |
//|   XY          - набор данных, массив [0..NPoints-1,0..NVars-1].  |
//|   NPoints     - размер набора данных, N точек>=K                 |
//|   NVars       - количество переменных, NVars>=1                  |
//|   K           - желаемое количество кластеров, K>=1              |
//| ВЫХОДНЫЕ ПАРАМЕТРЫ:                                              |
//|   CT          - массив[0..NVars-1,0..K-1].матрица, столбцы которой|
//|                 хранят центры кластера                           |
//|   XYC         - массив[точек], содержащий кластер                |
//|               индексы                                            |
//+------------------------------------------------------------------+
void predict_CKMeans(CMatrixDouble &xy,const int npoints,
                     const int nvars,const int k,
                     CMatrixDouble &ct,int &xyc[])
{
//--- fill XYC with center numbers
   ArrayResize(xyc,npoints);
   for(int i=0; i<npoints; i++)
   {
      int cclosest=-1;
      double dclosest=1 E300, tmp;
      for(int j=0; j<k; j++)
      {
         double v=0.0;
         for(int i_=0; i_<nvars; i_++)
         {
            tmp=xy[i][i_]-ct[i_][j];
            v+=tmp*tmp;
         }
         if(v<dclosest)
         {
            cclosest=j;   //--- check
            dclosest=v;
         }
      }
      xyc[i]=cclosest;//--- change value
   }
}
//+------------------------------------------------------------------+

Here is the error:

2024.03.11 14:45:46.489 Tree_K-Means_Test (USDJPY,H1)   index out of range in 'dataanalysis.mqh' (5684,28)

Error here

void CKMeans::KMeansGenerate(CMatrixDouble &xy,const int npoints,
                             const int nvars,const int k,
                             const int restarts,int &info,
                             CMatrixDouble &c,int &xyc[])
  {
//--- create variables
   int    i=0;
   int    j=0;
   double e=0;
   double ebest=0;
   double v=0;
   int    cclosest=0;
   bool   waschanges;
   bool   zerosizeclusters;
   int    pass=0;
   int    i_=0;
   double dclosest=0;
//--- creating arrays
   int    xycbest[];
   double x[];
   double tmp[];
   double d2[];
   double p[];
   int    csizes[];
   bool   cbusy[];
   double work[];
//--- create matrix
   CMatrixDouble ct;
   CMatrixDouble ctbest;
//--- initialization
   info=0;
//--- Test parameters
   if(npoints<k || nvars<1 || k<1 || restarts<1)
     {
      info=-1;
      return;
     }
//--- TODO: special case K=1
//--- TODO: special case K=NPoints
   info=1;
//--- Multiple passes of k-means++ algorithm
   ct.Resize(k,nvars);
   ctbest.Resize(k,nvars);
   ArrayResizeAL(xyc,npoints);
   ArrayResizeAL(xycbest,npoints);
   ArrayResize(d2,npoints);
   ArrayResize(p,npoints);
   ArrayResize(tmp,nvars);
   ArrayResizeAL(csizes,k);
   ArrayResizeAL(cbusy,k);
//--- change value
   ebest=CMath::m_maxrealnumber;
//--- calculation
   for(pass=1; pass<=restarts; pass++)
     {
      //--- Select initial centers  using k-means++ algorithm
      //--- 1. Choose first center at random
      //--- 2. Choose next centers using their distance from centers already chosen
      //--- Note that for performance reasons centers are stored in ROWS of CT,not
      //--- in columns. We'll transpose CT in the end and store it in the C.
      i=CMath::RandomInteger(npoints);
      for(i_=0; i_<=nvars-1; i_++)
         ct.Set(0,i_,xy[i][i_]);
      cbusy[0]=true;
      for(i=1; i<=k-1; i++)
         cbusy[i]=false;
 
Aleksey Vyazmikin #:
void f_Klaster(double &Input_arr_Data[],int &Output_arr_Klaster[],int N_Stolb,int N_Strok,int N_Klasters)
{
   CMatrixDouble MatrixLearn;//обучающая часть данных
Задайте размер матрицы - пример в функции чтения из файла
   for(int col=0; col<N_Stolb; col++)//Число столбцов
   {
      for(int row=0; row<N_Strok; row++)//Число строк
      {
         MatrixLearn[row].Set(col,Input_arr_Data[N_Stolb*row+col]);//строка/столбец
      }
   }

first thing I saw

MatrixLearn.Resize(rows, cols);
 
Forester #:

first thing I saw

Thanks, it really helped - it's strange that it doesn't give an error when filling with data from an array.

This code

//+------------------------------------------------------------------+
//|Функция выполняет кластеризацию
//+------------------------------------------------------------------+
void f_Klaster(double &Input_arr_Data[],int &Output_arr_Klaster[],int N_Stolb,int N_Strok,int N_Klasters)
{
   CMatrixDouble MatrixLearn;//обучающая часть данных
   MatrixLearn.Resize(N_Strok, N_Stolb);
   for(int col=0; col<N_Stolb; col++)//Число столбцов
   {
      for(int row=0; row<N_Strok; row++)//Число строк
      {
         MatrixLearn[row].Set(col,Input_arr_Data[N_Stolb*row+col]);//строка/столбец
      }
   }

   CMatrixDouble MatrixCentroid;//массив центров
   int KM_info=0;
   CAlglib::KMeansGenerate(
      MatrixLearn,//- массив [строк , столбцов]
      N_Strok,// - число строк
      N_Stolb,// - число столбцов
      N_Klasters,//- число кластеров, сколько хотите получить
      10,// - число попыток запуска для лучшей инициализации
      KM_info,//- коды ошибок и результата выполнения
      MatrixCentroid,//- массив центров
      Output_arr_Klaster// - номер кластера для каждой строки из XY
   );

   Print("KM_info=",KM_info);
   switch(KM_info)
   {
   case -3:
      Print("-3, если задача является вырожденной (количество | отличных точек меньше K)");
      break;
   case -1:
      Print("-1, если было передано неверное NPoints/NFeatures/K/Перезапусков");
      break;
   case 1:
      Print("1, если подпрограмма успешно завершена");
      break;
   }
}

However, I get a value of "-3" in the KM_info variable - and now it's a mystery. Any ideas why this is the case?

 
Aleksey Vyazmikin #:

Thanks, really helped - it's strange that it doesn't give an error when filling with data from an array.

This code

However, I get a value of "-3" in the KM_info variable - and now it's a mystery. Any ideas why this is the case?

See the code
if(zerosizeclusters)
{
//--- Some clusters have zero size - rare,but possible.
//--- We'll choose new centres for such clusters using k-means++ rule
//--- and restart algorithm
if(!SelectCenterPPP(xy,npoints,nvars,ct,cbusy,k,d2,p,tmp))
{
info=-3;
return;
}
continue;
}

Probably because you only have 0 and 1. Check it on more diverse data first

 
Forester #:

See the code
if(zerosizeclusters)
{
//--- Some clusters have zero size - rare,but possible.
//--- We'll choose new centres for such clusters using k-means++ rule
//--- and restart algorithm
if(!SelectCenterPPP(xy,npoints,nvars,ct,cbusy,k,d2,p,tmp))
{
info=-3;
return;
}
continue;
}

Probably because you only have 0 and 1. Check with more diverse data first

Checked on a variety of numbers

//---Создаем выборку
   ArrayResize(arr_Data_Load,Strok_Total_Data*Stolb_Total_Data);
   ArrayInitialize(arr_Data_Load,0);
   MathSrand(100);
   int Get_Rand=0;
   for(int i=0; i<Strok_Total_Data; i++)
      for(int j=0; j<Stolb_Total_Data; j++)
      {
         Get_Rand=MathRand();
         //if(Get_Rand>16000)arr_Data_Load[i*Stolb_Total_Data+j]=1;
         arr_Data_Load[i*Stolb_Total_Data+j]=Get_Rand;
      }

The result is the same :(

Something else I'm doing wrong....

 
mytarmailS #:

you're dead wrong.

or you're wrong

He's damn right.

 
Aleksey Vyazmikin #:

Tested it on a variety of numbers

The result is the same :(

Something else I'm doing wrong ...

I don't have any other ideas. Try printing out the MatrixLearn part maybe something is wrong....
 
Forester #:
MatrixLearn

Printed like this - got zeros

   for(int row=0; row<N_Strok; row++)//Число строк
   {
      Print(MatrixLearn[row]);//строка/столбец
   }

Did I print it correctly?

If it is correct, then there is an error here

MatrixLearn[row].Set(col,Input_arr_Data[N_Stolb*row+col]);//строка/столбец

I did it by analogy, as in your code.

Reason: