Optimisation of algorithms. - page 4

 
MetaDriver:
Now I have a problem. I need to make efficient sorting of arrays with big "cost" of elements copying (i.e. elements of which are large, "heavy" structures, class objects, long strings, etc.). Common sense suggests that we should leave them in place, but sort instead some kind of pointers - indexes of cells of their original location. Here is the quote from https://www.mql5.com/ru/forum/6476#comment_178318Оставим so far, and let's implement it on mql5.

Everything has already been stolen before us :)

Spreadsheets in MQL5

The input should be a copy of the array to be sorted, comments should be annotated, and the unnecessary should be commented out

//+------------------------------------------------------------------+
//void QuickSort(double &A[],int &r[],int from,int to)
void QuickSort(double &A[],int from,int to)
  {
   int i,j;//,itemp;
   double x,temp;
   if(from>=to)return;
   i=from;
   j=to;
   x=A[(from+to)>>1];
   while(i<=j)
     {
      while(A[i]<x)i++; 
      while(A[j]>x)j--;
      if(i<=j)
        {
         temp=A[i]; A[i]=A[j]; A[j]=temp;
         //itemp=r[i]; r[i]=r[j]; r[j]=itemp;
         i++;
         j--;
        }
     }
   QuickSort(A,from,j);
   //QuickSort(A,r,from,j);
   QuickSort(A,i,to);  
   //QuickSort(A,r,i,to);
  }
//+------------------------------------------------------------------+
 
Urain:

Everything has already been stolen before us :)

Spreadsheets in MQL5

The input should be a copy of the array to be sorted, the comments should be annotated and the unnecessary commented out.

You're mean! You ruined the song... Or rather - tried to. :)

// It's clear that there are good approximations. I might as well have pulled a sample from the standard library. :)

There are patterns. And there are some excellent ones, too. But still. Here it's important to coding and debugging everything to a working condition of a separate usable product. Moreover (why I am sending it here) - as fast as possible. I.e. I suggest to squeeze all possible performance up to hundredths of a percent inclusive. :)

This is the first. Second, for objects we need the number of index arrays equal to the number of sorting criteria, which in general case can be several, + (preferably) function of insertion in sorted according to several criteria indexed array.

 
MetaDriver:

You're mean! You ruined the song... Or rather, you tried. :)

// It's clear that there are good approximations. I might as well have pulled a sample from the standard library. :)

There are samples. And even some great ones. But still. It's important to check and debug everything to a working condition of a separate usable product. And (why I'm sending it here) - the fastest one. I.e. I suggest to squeeze all possible performance up to hundredths of a percent inclusive. :)

That's the first thing. Second - for objects we need the number of index arrays equal to the number of sorting criteria, which in general may be several, + (preferably) function of insertion in sorted by several criteria indexed array.

Same answer Spreadsheets in MQL5.

It's all there. Under a concrete problem, it is possible to remake under manipulation not columns but rows, there are columns made for it is possible to declare them as different types. If the table type is the same, everything can be replayed.

 

Made an inluder with sorting indexes for base types.

bool ArrayIndexSort(const void &source[], int &index[], bool byDescending=true);

Default sorting is "descending", to sort in ascending direction, set sorting direction flag to false.

Test results: // indexing arrays double[], int[], string[]; sequentially : raw array, descending array, ascending array

2012.04.08 05:04:46     IndexSort_Test (USDJPY,M30)     SArray[index[i]] = {MetaQuotes, Абырвалг, Газета, Колбаса, Компилятор, Молоко, Препроцессор, Яйцо}
2012.04.08 05:04:46     IndexSort_Test (USDJPY,M30)     SArray[index[i]] = {Яйцо, Препроцессор, Молоко, Компилятор, Колбаса, Газета, Абырвалг, MetaQuotes}
2012.04.08 05:04:46     IndexSort_Test (USDJPY,M30)     SArray[i]        = {Абырвалг, Газета, MetaQuotes, Колбаса, Молоко, Яйцо, Компилятор, Препроцессор}
2012.04.08 05:04:46     IndexSort_Test (USDJPY,M30)     iarray[index[i]] = {89, 2393, 3742, 4772, 5098, 5364, 5560, 6226, 6758, 7029, 7245, 7540, 8097, 8195, 8908, 8945}
2012.04.08 05:04:46     IndexSort_Test (USDJPY,M30)     iarray[index[i]] = {8945, 8908, 8195, 8097, 7540, 7245, 7029, 6758, 6226, 5560, 5364, 5098, 4772, 3742, 2393, 89}
2012.04.08 05:04:46     IndexSort_Test (USDJPY,M30)     iarray[i]        = {8945, 7245, 8195, 5364, 8097, 5560, 5098, 3742, 89, 6758, 6226, 7029, 4772, 7540, 8908, 2393}
2012.04.08 05:04:46     IndexSort_Test (USDJPY,M30)     darray[index[i]] = {0.29, 13.40, 16.11, 28.86, 31.05, 35.63, 38.71, 39.65, 47.79, 50.33, 50.40, 59.39, 63.31, 65.35, 70.65, 78.98}
2012.04.08 05:04:46     IndexSort_Test (USDJPY,M30)     darray[index[i]] = {78.98, 70.65, 65.35, 63.31, 59.39, 50.40, 50.33, 47.79, 39.65, 38.71, 35.63, 31.05, 28.86, 16.11, 13.40, 0.29}
2012.04.08 05:04:46     IndexSort_Test (USDJPY,M30)     darray[i]        = {50.33, 47.79, 39.65, 70.65, 31.05, 16.11, 38.71, 78.98, 50.40, 35.63, 0.29, 63.31, 13.40, 59.39, 28.86, 65.35}

Library and test in trailer.

Put the indexer in the folder "MQL5\Include\Indexes\".

Files:
 

Here's a sample class for working with OCL. Of course, some things are incomplete and awkward, but maybe someone will find it useful.

//——————————————————————————————————————————————————————————————————————————————
//|                                                            class   OCL.mqh |
//|                                                Copyright 2011, JQS aka Joo |
//|                                        https://www.mql5.com/ru/users/joo |
//——————————————————————————————————————————————————————————————————————————————
#property copyright "Copyright 2011, JQS aka Joo"
#property link      "https://www.mql5.com/ru/users/joo"
//——————————————————————————————————————————————————————————————————————————————
class OCL
{
public:
  bool              DeviceInfo(int DeviceID);
  bool              InitOCL   (int DeviceID,uint &Offset[],uint &Work[],int BuffCount,int ArgCount,string Source);
  bool              BuffInit  (int buffID,int size,int regime);
  bool              Execute   (int work_dim);
  void              DeinitOCL ();
  //---
  int               BuffID[]; // идентификаторы буферов
  //----------------------------------------------------------------------------
private:
  int               cl_ctx;   // идентификатор контекста
  int               cl_prg;   // идентификатор программы
  int               cl_krn;   // идентификатор ядра
  //---
  int               ArguID[]; // идентификаторы аргументов
  uint              offset[]; 
  uint              work  [];
};
//——————————————————————————————————————————————————————————————————————————————
bool OCL::DeviceInfo(int DeviceID)
{
  long DeviceCount=CLGetInfoInteger(0,CL_DEVICE_COUNT);
  if(DeviceCount<1)
    return(false);
  else
  {
    string s="Всего устройств OCL: "+(string)CLGetInfoInteger(0,CL_DEVICE_COUNT)+" - ";
    long DeviceType=CLGetInfoInteger(DeviceID,CL_DEVICE_TYPE);

    switch(DeviceType)
    {
    case CL_DEVICE_ACCELERATOR:
      s=s+"Используется спец.OpenCL ускоритель (например, IBM CELL Blade)";
      break;
    case CL_DEVICE_CPU:
      s=s+"Используется CPU";
      break;
    case CL_DEVICE_GPU:
      s=s+"Используется GPU";
      break;
    case CL_DEVICE_DEFAULT:
      s=s+"Устройство по умолчанию";
      break;
    case CL_DEVICE_CUSTOM:
      s=s+"Специализированный ускоритель, не поддерживает OpenCL";
      break;
    default:
      s=s+"Непонятное устройство, скорее всего это какое то устройство по умолчанию";
      break;
    }
    Print(s);
    return(true);
  }
}
//——————————————————————————————————————————————————————————————————————————————
bool OCL::InitOCL(int DeviceID,uint &Offset[],uint &Work[], int BuffCount,int ArgCount,string Source)
{
  ArrayResize(offset,ArraySize(Offset)); ArrayCopy(offset,Offset,0,0,WHOLE_ARRAY);
  ArrayResize(work,  ArraySize(Work));   ArrayCopy(work,  Work,  0,0,WHOLE_ARRAY);

  if((cl_ctx=CLContextCreate(DeviceID))==-1)
  {
    Print("OpenCL не найден: ",GetLastError());
    return(false);
  }
//----------------------------------------------------------------------------
// Создаем OpenCL программу из исходного кода
  if((cl_prg=CLProgramCreate(cl_ctx,Source))==-1)
  {
    CLContextFree(cl_ctx);
    Print("Ошибка созания OpenCL-программы");
    switch(GetLastError())
    {
    case ERR_OPENCL_INVALID_HANDLE:
      Print("Некоректный хендл на программу OpenCL");
      break;
    case ERR_INVALID_PARAMETER:
      Print("Некоректный строковой параметр");
      break;
    case ERR_OPENCL_TOO_LONG_KERNEL_NAME:
      Print("Имя кернела содержит более 127 символов");
      break;
    case ERR_OPENCL_KERNEL_CREATE:
      Print("Внутренняя ошибка при создании объекта OpenCL.");
      break;
    default: Print("Неопознанная ошибка.");
    }
    return(false);
  }
  //----------------------------------------------------------------------------
  //Создаем точку входа в программу OpenCL  
  if((cl_krn=CLKernelCreate(cl_prg,"Work"))==-1)
  {
    CLProgramFree(cl_prg);
    CLContextFree(cl_ctx);
    Print("Ошибка создания OpenCL-ядра");
    return(false);
  }
  //----------------------------------------------------------------------------
  ArrayResize(BuffID,BuffCount);
  ArrayResize(ArguID,ArgCount);
  return(true);
}
//——————————————————————————————————————————————————————————————————————————————
void OCL::DeinitOCL()
{
  for(int i=0;i<ArraySize(BuffID);i++)
    CLBufferFree(BuffID[i]);

  CLKernelFree (cl_krn);
  CLProgramFree(cl_prg);
  CLContextFree(cl_ctx);
}
//——————————————————————————————————————————————————————————————————————————————
bool OCL::Execute(int work_dim)
{
  if(!CLExecute(cl_krn,work_dim,offset,work))
  {
    Print("Ошибка при расчете в OpenCL");
    return(false);
  }
  return(true);
}
//——————————————————————————————————————————————————————————————————————————————
bool OCL::BuffInit(int buffID,int size,int regime)
{
  if(buffID>=ArraySize(BuffID))
    return(false);

  BuffID[buffID]=CLBufferCreate(cl_ctx,size,regime);
  if(BuffID[buffID]==-1)
  {
    Print("OpenCL buffer create failed");
    switch(GetLastError())
    {
    case ERR_OPENCL_INVALID_HANDLE:
      Print("Нкоректный хендл на контекст OpenCL");
      break;
    case ERR_NOT_ENOUGH_MEMORY:
      Print("Недостаточно памяти");
      break;
    case ERR_OPENCL_BUFFER_CREATE:
      Print("Внутренняя ошибка создания буфера");
      break;
    default: Print("Неопознанная ошибка.");
    }
    return(false);
  }
  //Выставляем буфер OpenCL в качестве параметра функции OpenCL
  if(!CLSetKernelArgMem(cl_krn,buffID,BuffID[buffID]))
    return(false);
  return(true);
}
//——————————————————————————————————————————————————————————————————————————————


I've reworked the initialization a bit, now you can run multidimensional calculations.

 
That's good, I like that. Should I redo my C-like code?
 

Great topic!

Just now I faced an optimization problem with an algorithm for finding a price extremum (minimum). The conditions are as follows: there is a bar, n bars to the left and right of which are below (above) its maximum:

n is a free arbitrarily chosen value. The period n is always odd, because the sum of the two bars to the right and to the left will always be an even number to which the central bar of the price extremum proper is added.

I didn't think much about the first version of the algorithm and wrote the most obvious code. Now I am writing it in C# using WealthLab platform, but I think you can easily understand the essence of the problematic algorithm, here is the most problematic part of it:

//Перебираем все бары
for (int bar = 0; bar < MyQuotes.Count; bar++)
{
   //Подготовка данных
   ...
   int pperiod = (N - 1)/2;
   //Перебираем бары относительно потенциального экстремума
   for (int i = 1; i <= pperiod; i++)
   {
      if (MyQuotes.High[bar - i] > MyQuotes.High[bar] ||
         MyQuotes.High[bar + i] > MyQuotes.High[bar])
         up = false;
      if (MyQuotes.Low[bar - i] < MyQuotes.Low[bar] ||
         MyQuotes.Low[bar + i] < MyQuotes.Low[bar])
         down = false;
   }
}

The whole problem is in the second loop. It simultaneously handles left and right branches of a potential extremum and therefore goes through only (N - 1)/2 bars, but that is not enough. Measurements show that time spent on identifying an extremum in an arithmetic progression depends on the period N, which is very, very bad:

N         Time
3       00:00:00.5460312
4       00:00:00.4720270
5       00:00:00.4820276
6       00:00:00.4250243
7       00:00:00.4410252
8       00:00:00.4350249
9       00:00:00.4300246
10      00:00:00.4380251
11      00:00:00.4520258
12      00:00:00.4420253
13      00:00:00.4500257
14      00:00:00.4640266
15      00:00:00.5100291
16      00:00:00.4950284
17      00:00:00.5200297
18      00:00:00.5110292
19      00:00:00.5090291
20      00:00:00.5690326
21      00:00:00.5320304
22      00:00:00.5560318
23      00:00:00.5750329
24      00:00:00.5850335
25      00:00:00.6140351
26      00:00:00.6120350
27      00:00:00.6160352
28      00:00:00.6510373
29      00:00:00.6510372
30      00:00:00.6770387
31      00:00:00.6900395
32      00:00:00.7040403
33      00:00:00.7000400
34      00:00:00.7190411
35      00:00:00.7320419
36      00:00:00.7510430
37      00:00:00.7510429
38      00:00:00.8290474
39      00:00:00.7760444
40      00:00:00.8080463
41      00:00:00.7990457
42      00:00:00.8240471
43      00:00:00.8460484
44      00:00:00.8690497
45      00:00:00.8680496
46      00:00:00.9120522
47      00:00:00.8870507
48      00:00:00.9520545
49      00:00:00.9230528
50      00:00:00.9430539
51      00:00:00.9460541
52      00:00:00.9590549
53      00:00:00.9750558
54      00:00:00.9920567
55      00:00:01.0010573
56      00:00:01.0440597
57      00:00:01.0400595
58      00:00:01.0610607
59      00:00:01.0610606
60      00:00:01.0860622
61      00:00:01.0730613
62      00:00:01.1170639
63      00:00:01.1400652
64      00:00:01.1370651
65      00:00:01.1190640
66      00:00:01.1960684
67      00:00:01.1740671
68      00:00:01.2110693
69      00:00:01.2490715
70      00:00:01.3010744
71      00:00:01.2750730
72      00:00:01.3090748
73      00:00:01.3000744
74      00:00:01.3060747
75      00:00:01.3610779
76      00:00:01.3740785
77      00:00:01.4190812
78      00:00:01.3500772
79      00:00:01.3350764
80      00:00:01.3530774
81      00:00:01.4690840
82      00:00:01.4270816
83      00:00:01.3870794
84      00:00:01.4250815
85      00:00:01.4250815
86      00:00:01.4500829
87      00:00:01.4600835
88      00:00:01.4630837
89      00:00:01.4500830
90      00:00:01.5060861
91      00:00:01.4930854
92      00:00:01.5340878
93      00:00:01.5620893
94      00:00:01.5470885
95      00:00:01.5450884
96      00:00:01.5730899
97      00:00:01.5640895
98      00:00:01.5840906
99      00:00:01.5970914

Trying through the periods will take the time to sum up the arithmetic progression, and this is a very large value.

One possible solution is to introduce an additional variable. After all, if an extremum is identified, it is guaranteed that there is no bar to its right for (N - 1)/2 so a new extremum can be identified starting with bar: current_bar + (N - 1)/2. However, extrema need to be identified along with minima and a new minimum can be found before current_bar + (N - 1)/2. It will therefore be necessary to split the search for extrema and minima into two passes which will reduce the performance gain to zero. We can easily split two passes into two threads processed simultaneously on two cores in C# but I would like to find the optimal algorithm and optimize it first of all. I am waiting for the help of experts.

 
C-4: I am just now faced with the optimization problem of the algorithm for searching for a price extremum (minimum).

Well this is not an optimization problem.

Trying through the periods will take the sum of the arithmetic progression, and this is a very large value.

Finding an extremum seems to be a problem of the order of O(n), where n is the number of data. How you can make this asymptotic worse, i.e. O(n^2) - I can't even imagine. Or you're confusing the terms.

The simplest algorithm is ArraySort(), built-in fast enough, something around O(n * ln( n ) ). But it's probably redundant for this problem.

You could come up with something recursive that would be faster.

How long does it take to find the minimum and for how many bars? Well, I don't believe that for 100 bars you're searching for a maximum of one and a half seconds.

 
Mathemat:

The simplest algorithm is ArraySort(), the built-in sorting is fast enough. But it is probably redundant for this task.

The best sorting is O(n*log(n)). Exactly redundant.

We could come up with something recursive which would be faster.

Slower. Recursion is most often evil. Recursive? This is probably a case where no matter how you do it, the speed will be about the same.

By code:

//Перебираем все бары
for (int bar = 0; bar < MyQuotes.Count; bar++)
{
   //Подготовка данных
   ...
   int pperiod = (N - 1)/2;
   //Перебираем бары относительно потенциального экстремума
   for (int i = 1; i <= pperiod; i++)
   {
      if (MyQuotes.High[bar - i] > MyQuotes.High[bar] ||
         MyQuotes.High[bar + i] > MyQuotes.High[bar])
         up = false;
      if (MyQuotes.Low[bar - i] < MyQuotes.Low[bar] ||
         MyQuotes.Low[bar + i] < MyQuotes.Low[bar])
         down = false;
   }
}

Cycles for min and max must be explicitly separated. And exit the loop immediately if it fails.

 
TheXpert: This is probably the case where no matter how you do it, the speed will be about the same.

In principle, yes. But still no more than O(n).

OCL would help here. The asymptotics will remain the same, of course. But the speed could well be increased by a hundred times.

Reason: