How to improve the quality of memory in computer systems

Maxim Dmitrievsky 2023.10.04 00:06 #32801

fxsaber #:

Well, you need Pearson.

I'm not sure how to do it, and I'm sleepy.

Something similar.

>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.array([1, 2, 3])
>>> a = (a - np.mean(a)) / (np.std(a))
>>> b = (b - np.mean(b)) / (np.std(b))
>>> np.correlate(a, b, 'full')
array([-1.8973666 , -1.42302495,  0.9486833 ,  0.9486833 ,  0.9486833 ,
        0.9486833 ,  0.9486833 ,  0.9486833 ,  0.9486833 , -1.42302495,
       -1.8973666 ])
>>>

fxsaber 2023.10.04 00:11 #32802

Maxim Dmitrievsky #:

I'm not sure how to do it, and I'm sleepy.

something similar

Yeah, that's not it.

Maxim Dmitrievsky 2023.10.04 00:13 #32803

fxsaber #:

Right, wrong.

It's almost something, look it up, I'm off.

fxsaber 2023.10.04 13:36 #32804

fxsaber #:

Trying to quickly find similar short strings in a long string.

It takes more than six seconds for such implementation via Alglib to search for similar short strings (300) in the millionth string.

I accelerated it.

#include <fxsaber\Math\Math.mqh> // https://www.mql5.com/ru/code/17982

const vector<double> GetCorr( const double &Array[], const double &Pattern[], const int Step = 1 )
{
  double Corr[];  
  MathCorrelationPearson(Array, Pattern, Corr, Step);
  
  ArrayRemove(Corr, 0, ArraySize(Pattern) - 1);  
  
  vector<double> Res;
  Res.Swap(Corr);
  
  return(Res);
}

#property script_show_inputs

input int inRows = 300; // Длина короткой строки
input int inCols = 1000000; // Длина длинной строки

// Поиск похожей строки в длинной строке.
void OnStart()
{  
  if (inRows < inCols)
  {
    PrintCPU(); // https://www.mql5.com/ru/forum/86386/page3256#comment_49538685
    
    double Array[]; // Длинная строка, где будет искать.
    double Pattern[]; // Короткая строка, с которой будем сравнивать.
    CMatrixDouble Matrix;
    
    FillData(Array, Pattern, Matrix, inRows, inCols); // https://www.mql5.com/ru/forum/86386/page3278#comment_49725614
            
    Print(TOSTRING(inRows) + TOSTRING(inCols));

    vector<double> vPattern;  
    vPattern.Assign(Pattern);

    ulong StartTime, StartMemory; // https://www.mql5.com/ru/forum/86386/page3256#comment_49538685

    BENCH(vector<double> Vector1 = GetCorr(Matrix, vPattern)) // https://www.mql5.com/ru/forum/86386/page3278#comment_4972561 4
    BENCH(vector<double> Vector2 = GetCorr(Array, Pattern))
    BENCH(vector<double> Vector3 = GetCorr(Array, Pattern, -1))
    
    Print(TOSTRING(IsEqual(Vector1, Vector2)));
    Print(TOSTRING(IsEqual(Vector3, Vector2)));
  }      
}

Result.

EX5: 4000 AVX Release.
TerminalInfoString(TERMINAL_CPU_NAME) = Intel Core i7-2700 K  @ 3.50 GHz 
TerminalInfoInteger(TERMINAL_CPU_CORES) = 8 
TerminalInfoString(TERMINAL_CPU_ARCHITECTURE) = AVX 
inRows = 300 inCols = 1000000 
vector<double> Vector1 = GetCorr(Matrix, vPattern) - 7158396 mcs, 8 MB
vector<double> Vector2 = GetCorr(Array, Pattern) - 364131 mcs, 8 MB
vector<double> Vector3 = GetCorr(Array, Pattern, -1) - 323935 mcs, 7 MB
IsEqual(Vector1, Vector2) = true 
IsEqual(Vector3, Vector2) = true

Now in 300 milliseconds.

fxsaber 2023.10.04 13:41 #32805

fxsaber #:

Now in 300 milliseconds.

When no matrix can do it.

inRows = 30000 inCols = 10000000 
vector<double> Vector2 = GetCorr(Array, Pattern) - 10567928 mcs, 76 MB
vector<double> Vector3 = GetCorr(Array, Pattern, -1) - 3006838 mcs, 77 MB

It takes three seconds to find similar 30K strings in a 10M string.

mytarmailS 2023.10.04 13:58 #32806

fxsaber #:

When no matrix can handle it.

It takes three seconds to find similar 30K strings in a 10M string.

Very cool, but just as useless.

Is this an example of fft()?

fxsaber 2023.10.04 14:09 #32807

mytarmailS #:

Is this an example with fft()?

300/1M is not fft, 30K/10M is fft.

Aleksey Vyazmikin 2023.10.04 14:27 #32808

fxsaber #:

When no matrix can handle it.

It takes three seconds to find similar strings of length 30K in a string of 10M.

Impressive result!

Aleksey Vyazmikin 2023.10.04 14:48 #32809

I took a sample from 2010 to 2023 (47k lines), divided it into 3 parts in chronological order, and decided to see what would happen if we swap these parts.

The size of subsamples train - 60%, test - 20% and exam - 20%.

I made these combinations (-1) - this is the standard order - chronological. Each sub-sample has its own colour.

Trained 101 models with different Seed for each set of samples, and got the following result

All metrics are standard, and it can be seen that it is difficult to determine the average profit of the models (AVR Profit), as well as the percentage of models whose profit exceeds 3000 points on the last sample that did not participate in training.

Maybe the relative success rate of the -1 and 0 variants in the training sample size should be reduced? In general, it seems that Recall reacts to this.

In your opinion, should the results of such combinations be comparable to each other in our case? Or is the data irretrievably outdated?

Is there a pattern neural network and inputs Discussion of article "Grokking

СанСаныч Фоменко 2023.10.04 15:29 #32810

Aleksey Vyazmikin #:

I took a sample from 2010 to 2023 (47k lines), divided it into 3 parts in chronological order, and decided to see what would happen if we swap these parts.

The size of subsamples train - 60%, test - 20% and exam - 20%.

I made these combinations (-1) - this is the standard order - chronological. Each sub-sample has its own colour.

Trained 101 models with different Seed for each set of samples, and got the following result

All metrics are standard, and it can be seen that it is difficult to determine the average profit of the models (AVR Profit), as well as the percentage of models whose profit exceeds 3000 points on the last sample that did not participate in training.

Maybe the relative success rate of the -1 and 0 variants in the training sample size should be reduced? In general, it seems that Recall reacts to this.

In your opinion, should the results of such combinations be comparable to each other in our case? Or is the data irretrievably outdated?

Another do-it-yourself...

There is cross validation, everything is chewed and chewed..., widely used....

Machine learning in trading: theory, models, practice and algo-trading - page 3281