Numerical series density - page 3

 
Vladimir:
I remembered one way of clustering. It goes something like this: you allocate groups of elements (clusters) in a set, such that the maximum distance between elements of one cluster is less than the minimum distance from any element of that cluster to the element not in that cluster. The distance can be an ordinary distance, a modulus of difference of real numbers. Of course, there will not necessarily be only one such cluster. Maybe you don't need exactly one cluster, maybe you should compare them with each other in other ways as well. For example, the average time of occurrence of a level in the group.

This is interesting, but so far I can't figure out how to determine what numbers are in the cluster - by brute force? Then, I suppose, there will be groups that overlap each other, because if we define a cluster by searching for the smallest delta, but larger with respect to other elements, then dropping some element from such cluster will shift the cluster - the distance between clusters will be important, if it is significant, it should work out.

Vladimir:

The distance between two points on the real axis we all measure as the modulus of their difference. In mathematics, this is commonly referred to as the metric. How do we measure the distance in the plane between points, pairs of numbers, each of which is real? Again we have the familiar ready solution - the Euclidean distance, the root of the square of the sum of the squares of the subordinate differences. And mathematicians have other metrics in the plane, e.g. the greatest modulus of two differences, the sum of moduli of differences(http://ad.cctpu.edu.ru/Math_method/math/45.htm). And this is only in the case of pairs of numbers. Only two numbers, and always two. And you need to enter a proximity measure in a much more complicated situation. There are not two numbers in a group, and there are different numbers in different groups.

We need to identify the largest group, or the same groups by the number of elements in them. My disadvantage is my inability to read complex formulas correctly, so I have to try to understand everything from examples and comments to them.

Vladimir:

There are metrics in mathematics that measure the distance between two functions. But again, always between two. Again not suitable for you, you have a group.

That's why it's important to understand it thoroughly yourself. Write, maybe we can formalise it down to an algorithm for obtaining a numerical characteristic of proximity in a set.

However, consider also giving up trying to create it. The link above says what requirements the metric must meet. They didn't appear there out of the blue, without any of them strange effects will occur. In the post above I gave an example of how to abandon such comprehensive attempts - let the points in the group be closer in pairs to each other on the real axis than to elements outside the point. You wouldn't have to invent something very non-trivial.

That's right, initially we determine the proximity of two points, and then we try to exclude the distance that is great - that's the question, how do we determine if the distance is great or not? This is where the algorithm has now failed - when the distance appears to be an order of magnitude greater.

 
Dmitry Fedoseev:
Didn't write it down - count the differences first. Then everything else.
So you have counted the differences in the "Delta" column, what do you suggest we do next?
 

Testing such an algorithm to filter the data before processing:

1. Sum the two deltas in sequence and multiply the value by two

2. Find the average value of the resulting numerical series

3. Create a new numerical series if the value is lower than the mean value

4. Repeat point 2-3 until the numerical series is less than half of the original series


NUMBER P./P.NumberDelta53,3325,829,60
1 10
2 20 10
3 30 10 40 40
4 40 10 40 40
5 50 10 40 40
6 51 1 22 22 22
7 52 1 4 4 4
8 53 1 4 4 4
9 54 1 4 4 4
10 60 6 14 14 14
11 70 10 32 32
12 80 10 40 40
13 120 40 100
14 150 30 140
15 190 40 140
16 210 20 120
17 223 13 66
18 232 9 44 44
19 250 18 54
20 260 10 56

5. After filtering, we already do the calculation according to the above algorithm

NO.P./P.NumberDeltaClose valuesProximity in a rowMaximumDenseDensityDensity v2
1 40 4
2 50 10 0 0 50
3 51 1 1 1 51 0,80 1,00
4 52 1 1 2 52
5 53 1 1 3 53
6 54 1 1 4 54
7 60 6 0 0

Tried different figures - got a plausible version, would be happy to hear critical comments.
 
-Aleks-:
So you counted the differences in the "Delta" column, what do you suggest we do next?
Why are you going around in circles? It has long been written here
 
Dmitry Fedoseev:
Why are you going around in circles? It's been written here for a long time now

Here you state "The longest section is when the original series is below average." but this, I understand, is a flaw in my algorithm, after which the decision was made to make a filter - I did it and now the algorithm doesn't get so obviously stupid when the numbers are significantly different from each other.

 
-Aleks-:

Here you state "The longest stretch is when the original series is below average.", but this, as I understand it, is a flaw in my algorithm, after which the decision was made to make a filter - I made it and now the algorithm does not get so obviously stupid when the numbers differ significantly from each other.

What is the disadvantage?

The filter is not a substitute for the algorithm. The filter is an addition to the algorithm.

 
Dmitry Fedoseev:

What is the disadvantage?

The filter is not a substitute for the algorithm. The filter is an addition to the algorithm.

I don't know what the disadvantage is - I may not see it yet.

I think I should try to code it now - can you help me if I have difficulties?

 
-Aleks-:

I don't know what the downside is - I may not see it yet.

I think I need to try and codify it now - can you help me if I'm having difficulties?

You should start first. Or maybe you won't have difficulties. But I won't think about anything before then, because it turns out that I'm thinking the wrong thing or thinking the wrong way...
 
Dmitry Fedoseev:
because every time I find out I'm thinking the wrong thing, I'm thinking the wrong thing...
That's what makes people unique...
 

Started to develop an algorithm - I'm making a filter now. Difficulty has arisen in synchronising the two columns - "Number" and "Delta"

Ideas on how to eliminate the inaccuracy would be welcome:

//+------------------------------------------------------------------+
//|                                              Test_FindOblast.mq4 |
//|                        Copyright 2017, MetaQuotes Software Corp. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2017, MetaQuotes Software Corp."
#property link      "https://www.mql5.com"
#property version   "1.00"
#property strict
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
   int massivSize=19; //размер массива  
   double Digit[19]=
     {
      10,
      20,
      30,
      40,
      50,
      51,
      52,
      53,
      54,
      60,
      70,
      80,
      120,
      150,
      190,
      210,
      223,
      232,
      250,
      260
     };
   double summDelta[19-1];
   int N=massivSize-1;//Количество оставшихся цифровых значений
   double avrMass=0;//Среднее значение массива дельт

//-Фильтр
//1. Суммируем  последовательно две дельты и умножаем значение на два
   for(int i=1;i<massivSize;i++)
     {
      summDelta[i-1]=((Digit[i]-Digit[i-1])+(Digit[i+1]-Digit[i]))*2;
     }
   for(int i=0;i<massivSize-1;i++) printf("summDelta[%d] = %G",i,summDelta[i]);

//2. Находим среднее значение получившегося числового ряда
//3. Составляем новый числовой ряд, если значение меньше среднего значения
//4. Повторяем пункт 2-3 пока числовой ряд не будет меньше половины первоначального ряда
   for(int Z=0;N>massivSize/2;Z++)
     {
      int SizeMass=ArraySize(summDelta);//Узнаем размер массива
      avrMass=iMAOnArray(summDelta,0,SizeMass,0,0,0);
      Print("Среднее значение получившегося числового ряда",Z,"=",avrMass);

      for(int i=0;i<SizeMass;i++)
        {            
         if(summDelta[i]>avrMass)
           {
            summDelta[i]=0;
            Digit[i]=0;
            N--;
           }
        }

         Print("N=",N);
         ArraySort(summDelta,WHOLE_ARRAY,0,MODE_DESCEND);
         ArraySort(Digit,WHOLE_ARRAY,0,MODE_DESCEND);
         if(N!=0)
           {
            ArrayResize(summDelta,N,0);
            for(int i=0;i<N;i++) printf("summDelta[%d] = %G",i,summDelta[i]);
            ArrayResize(Digit,N+1,0);
            for(int i=0;i<N+1;i++) printf("Digit[%d] = %G",i,Digit[i]);          
           }
         else
           {
            for(int i=0;i<N;i++) printf("summDelta[%d] = %G",i,summDelta[i]);            
            for(int i=0;i<N+1;i++) printf("Digit[%d] = %G",i,Digit[i]);  
            return;
           }
     }
      int SizeMass=ArraySize(summDelta);//Узнаем размер массива
      avrMass=iMAOnArray(summDelta,0,SizeMass,0,0,0);
      Print("Среднее значение получившегося числового ряда=",avrMass);

//-Основной алгоритм
//1. Находим разницу между числами - это как раз их близость друг от друга.

//2. Если число меньше среднего значения дельт, получившихся из п.1, то - 1, а если нет - 0.

//3. Если значение из п.2 равно 1, то суммируем значение с предыдущим итогом, если нет - 0.

//4. Находим максимальное значение из пункта 3.

//5. Определяем диапазон - находим значение из пункта 4 и ищем вверх из пункта 3 число с нулевым значением, потом увеличиваем найденное число на единицу.
//Таким образом мы получаем диапазон чисел, плотность которых наибольшая по отношению к другим.
  }
//+------------------------------------------------------------------+
Reason: