Machine learning in trading: theory, models, practice and algo-trading - page 3279

 
Grigori.S.B #:

And rightly so.

It's going to be sad news from the real world.

 

I'm trying to quickly find similar short strings in a long string.

Is it possible to optimally use Alglib ?

#include <Math\Alglib\statistics.mqh> // https://www.mql5.com/ru/code/11077

const vector<double> GetCorr( const CMatrixDouble &Matrix, const vector<double> &Pattern )
{
  CMatrixDouble Vector;
  CMatrixDouble Corr;
  
  Vector.Col(0, Pattern);

  CBaseStat::PearsonCorrM2(Vector, Matrix, Matrix.Rows(), 1, Matrix.Cols(), Corr);
  
  return(Corr.Row(0));
}

#property script_show_inputs

input int inRows = 300; // Длина короткой строки
input int inCols = 1000000; // Длина длинной строки

void FillArray( double &Array[], const int Amount )
{
  for (uint i = ArrayResize(Array, Amount); (bool)i--;)
    Array[i] = MathRand();
}

void FillMatrix( CMatrixDouble &Matrix, const double &Array[], const int Rows )
{
  Matrix.Resize(Rows, ArraySize(Array) + 1 - Rows);

  double ColArray[];
  vector<double> Vector;
  
  for (uint i = (uint)Matrix.Cols(); (bool)i--;)
  {
    ArrayCopy(ColArray, Array, 0, i, Rows);
    Vector.Swap(ColArray);
    
    Matrix.Col(i, Vector);
  }
}

void FillData( double &Array[], double &Pattern[], CMatrixDouble &Matrix, const int Rows, const int Cols )
{
  FillArray(Array, Cols + Rows - 1);
  FillArray(Pattern, Rows);

  FillMatrix(Matrix, Array, Rows);    
}

#define  TOSTRING(A) #A + " = " + (string)(A) + " "

// Поиск похожей строки в длинной строке.
void OnStart()
{  
  if (inRows < inCols)
  {
    PrintCPU(); // https://www.mql5.com/ru/forum/86386/page3256#comment_49538685
    
    double Array[]; // Длинная строка, где будет искать.
    double Pattern[]; // Короткая строка, с которой будем сравнивать.
    CMatrixDouble Matrix;
    
    FillData(Array, Pattern, Matrix, inRows, inCols); // Заполнили данные.
            
    Print(TOSTRING(inRows) + TOSTRING(inCols));

    vector<double> vPattern;  
    vPattern.Assign(Pattern);

    ulong StartTime, StartMemory; // https://www.mql5.com/ru/forum/86386/page3256#comment_49538685

    BENCH(vector<double> Vector1 = GetCorr(Matrix, vPattern))
//    BENCH(vector<double> Vector2 = GetCorr(Array, Pattern))
  
//    Print(TOSTRING(IsEqual(Vector1, Vector2)));
  }      
}


Result.

EX5: 4000 AVX Release.
TerminalInfoString(TERMINAL_CPU_NAME) = Intel Core i7-2700 K  @ 3.50 GHz 
TerminalInfoInteger(TERMINAL_CPU_CORES) = 8 
TerminalInfoString(TERMINAL_CPU_ARCHITECTURE) = AVX 
inRows = 300 inCols = 1000000 
vector<double> Vector1 = GetCorr(Matrix, vPattern) - 6725703 mcs, 8 MB

More than six seconds such implementation via Alglib searches for similar short strings (300) in the millionth string. NumPy can do it?

 
fxsaber #:

Trying to quickly find similar short strings in a long string.

Is it more optimal to use Alglib ?


Result.

More than eight seconds such implementation through Alglib searches in the millionth string similar to the short string (300). NumPy can do it?

And how will you evaluate the obtained matrix? I don't understand the principle of such evaluation.

 
Forester #:
And how will you evaluate the resulting 300*1000000 matrix? I don't understand the principle of such estimation.

  1. A row for 1 000 000.
  2. We take values in the interval [0..299] and put them into the first column of the 300x1000000 matrix.
  3. Values in the interval [1..300] are taken and placed in the second column of the 300x1000000 matrix.
  4. And so on.
The correlation of this matrix with some pattern at 300 is calculated. The output is the millionth vector of the corresponding Pearson coefficients.
 
fxsaber #:
It takes more than six seconds for this implementation to search through Alglib for a millionth string similar to a short string.

I also average about 6 seconds.

I've done several runs.

 system.time({
+   find_cor(y,x)
+ })
   user  system elapsed 
   4.15    0.03    5.70 
> system.time({
+   find_cor(y,x)
+ })
   user  system elapsed 
   4.38    0.02    5.16 
> system.time({
+   find_cor(y,x)
+ })
   user  system elapsed 
   4.18    0.01    6.10 
> system.time({
+   find_cor(y,x)
+ })
   user  system elapsed 
   4.08    0.00    5.99 

but I did it in the most usual way, I didn't look for any rocket science solutions.

 
mytarmailS #:

I'm averaging about six seconds, too.

I've done a few runs

but I did it the usual way, I didn't look for any rocket science solutions.

What kind of R do you have?

Microsoft's R uses Intel's bible for vectors and matrices....

 
СанСаныч Фоменко #:

What's your R?

Microsoft R uses the Intel bible for vectors and matrices....

regular...

but I wrote the function in C++ in R.

 
mytarmailS #:

ordinary

I wonder if Microsoft R + intel = ponts or really faster?

 
СанСаныч Фоменко #:

I wonder if Microsoft R + Intel = ponts or really faster?

I've never tried it. I'm curious too.

But I'm interested in general speed increase on any actions, not just matrices and vectors

 
mytarmailS #:

I've never tried it, I'm curious too.

But I'm interested in general speed increase on any actions, not only matrix and vector.

increase memory size by any of the known ways

including trying to use the shadow area of RAM plus (under BIOS)

increase the processor bit rate

increase the speed of access to the hard disc (as an option, allocate a part of RAM for the file with processed data, i.e. make a virtual hard disc)

coordinate all the hardware of the computer by the data bus frequency

use task processing in several parallel threads

Reason: