Algorithm for combining ranges of a segment - help to create - page 2

 
struct SAllVariants{
   int m_cut[][2];
   int len;
   int li;
   void Init(int aLen){
      ArrayResize(m_cut,aLen);
      len=0;
   }
   void AddCut(int & a[][2],int i){
      m_cut[len][0]=a[i][0];
      m_cut[len][1]=a[i][1];
      len++;
      li=i;
   }
   void Set(SAllVariants & a){
      len=a.len;
      li=a.li;
      for(int i=0;i<len;i++){
         m_cut[i][0]=a.m_cut[i][0];
         m_cut[i][1]=a.m_cut[i][1];
      }
   }
};

SAllVariants av[];

int cut[][2];

void OnStart(){

   Print("=============================================================");
  
   // заполняем массив сотней каких-нибудь отрезков
   FillArray(cut,100);

   //=== алгоритм ====================
   
   // поправить все отрезки, чтобы первая координата была меньше второй   
   int itmp;
   for(int i=0;i<ArrayRange(cut,0);i++){
      if(cut[i][0]>cut[i][1]){
         itmp=cut[i][0];
         cut[i][0]=cut[i][1];
         cut[i][1]=itmp;
      }
   }
   
   // сортировка массива по первой координате
   ArraySort(cut);
   ArrayPrint(cut);
   // удалить отрезки нулевой длины и повтряющиеся отрезки
   bool ex;
   int ti=0;
   for(int i=0;i<ArrayRange(cut,0);i++){
      if(cut[i][0]!=cut[i][1]){
         ex=false;
         for(int j=i-1;j>=0 && cut[j][0]==cut[i][0];j--){
            if(cut[j][0]==cut[i][0] && cut[j][1]==cut[i][1]){
               ex=true;
               break;
            }
         }
         if(!ex){
            cut[ti][0]=cut[i][0];
            cut[ti][1]=cut[i][1];
            ti++;
         }
      }
   }
   ArrayResize(cut,ti);
   
   // добавить первый отрезок в массив всех вариантов и еще отрезков с наименьшей координатой
   ArrayResize(av,1);
   av[0].Init(ArrayRange(cut,0)); // вдруг ряд получится из всех отрезков
   av[0].AddCut(cut,0);
   for(int i=1;i<ArrayRange(cut,0);i++){ 
      if(cut[0][0]==cut[i][0]){
         ArrayResize(av,ArraySize(av)+1);
         av[ArraySize(av)-1].Init(ArrayRange(cut,0));
         av[ArraySize(av)-1].AddCut(cut,i);
      }
   }

   // добавить в начало еще отрезков, начинающихся чуть позже, но не позднее конца самого длинного из начальных
   
   // на сначала найти диапазон
   int mn=av[0].m_cut[0][0];
   int mx=av[0].m_cut[0][1];
   for(int i=1;i<ArraySize(av);i++){
      mx=MathMax(mx,av[i].m_cut[0][1]);
   }
   
   // добавить
   for(int i=1;i<ArrayRange(cut,0);i++){
      if(cut[i][0]>mn && cut[i][0]<mx){
         ArrayResize(av,ArraySize(av)+1);
         av[ArraySize(av)-1].Init(ArrayRange(cut,0));
         av[ArraySize(av)-1].AddCut(cut,i);
      }
   }   

   // а теперь самое интересное
   double r;
   bool n;
   for(int i=0;i<ArraySize(av) && !IsStopped();i++){ // для каждого варианта
      //Comment(i," ",ArraySize(av));
      // найти ближайшее расстояние до следующего отрезка
      r=DBL_MAX;
      for(int j=av[i].li+1;j<ArrayRange(cut,0);j++){
         if(cut[j][0]>=av[i].m_cut[av[i].len-1][1]){
            r=MathMin(r,cut[j][0]-av[i].m_cut[av[i].len-1][1]); // потому что допускаются пропуски (важнее составить ряд, чем чтобы он был непрерывным)
         }
      }
      if(r!=DBL_MAX){
         n=false;
         for(int j=av[i].li+1;j<ArrayRange(cut,0);j++){
            if(cut[j][0]-av[i].m_cut[av[i].len-1][1]==r){
               if(!n){
                  n=true;
                  av[i].AddCut(cut,j);
               }
               else{
                  ArrayResize(av,ArraySize(av)+1);
                  av[ArraySize(av)-1].Init(ArrayRange(cut,0));
                  av[ArraySize(av)-1].Set(av[i]);
                  av[ArraySize(av)-1].AddCut(cut,j);
               }
            }
         }
         if(n){
            i--;
         }
      }
   }

   string str="";
   for(int i=0;i<ArraySize(av) && !IsStopped();i++){
      str="";
      for(int j=0;j<av[i].len;j++){
         str=str+(string)av[i].m_cut[j][0]+"-"+(string)av[i].m_cut[j][1]+" ";
      }
      Print("Вариант ",i," - ",str);
   }

}
//+------------------------------------------------------------------+

void FillArray(int & a[][2],int cutCnt){
   ArrayResize(a,cutCnt);
   for(int i=0;i<cutCnt;i++){
      a[i][0]=MathRand()%30;
      a[i][1]=a[i][0]+MathRand()%15;
   }   
}

Somehow...

 
Dmitry Fedoseev:

Somehow...

Thank you! You can see the hand of the master right away! I'll be trying it out tomorrow to see what comes out.

 
Dmitry Fedoseev:

Somehow...

Started to try out the code, created an artificial example - filled in the array

FillArray(cut,4);
cut[0][0]=0;
cut[0][1]=1;
cut[1][0]=3;
cut[1][1]=6;
cut[2][0]=2;
cut[2][1]=5;
cut[3][0]=7;
cut[3][1]=9;

Got it:

2021.04.21 00:52:08.328 Q_Podbor (Si-6.21,W1)       [,0][,1]
2021.04.21 00:52:08.328 Q_Podbor (Si-6.21,W1)   [0,]   0   1
2021.04.21 00:52:08.328 Q_Podbor (Si-6.21,W1)   [1,]   2   5
2021.04.21 00:52:08.328 Q_Podbor (Si-6.21,W1)   [2,]   3   6
2021.04.21 00:52:08.328 Q_Podbor (Si-6.21,W1)   [3,]   7   9
2021.04.21 00:52:08.328 Q_Podbor (Si-6.21,W1)   Вариант 0 - 0-1 2-5 7-9 

I.e. we got one variant, but we're expecting another variant as well.

Вариант 1 - 0-1 3-6 7-9 

Is it possible to teach the algorithm to find it too?

 
Aleksey Vyazmikin:

Started to try out the code, created an artificial example - filled in the array

Got it:

That is, we got one variant, but we are expecting another variant as well.

Can the algorithm learn to find it too?

And the variant "0-1 3-6 7-9" is not better than the variant "0 - 0-1 2-5 7-9" - both are from 0 to 1 and have 2 skips of length 1 each.

Two options appear in this case:

1 - do the same thing, but from the end of the set of skips.

2 - look right away not for the closest segment, but with a tolerance. But in this case there will be even more if there are a lot of data and a lot of docking sequences.

However after variant 1 you will want to start building chains from all possible starting positions. This is correct, but the amount of work for the algorithm increases considerably.

Yes! It is necessary to start construction of variants from each of the segments of the initial set and continue construction to the beginning and the end.

 
Dmitry Fedoseev:

And the option "0-1 3-6 7-9" is no better than the option "0 - 0-1 2-5 7-9" - both are from 0 to 1 and have 2 skips of length 1 each.

In this case they are equal, I agree, but they are different and by the terms of the problem we will need to estimate the sum of their scores, and until we build a line we will not know the combined score of all the segments.

Dmitry Fedoseev:

However, after option 1, there will be a desire to start constructing strings from all possible starting positions. This is correct, but the amount of work for the algorithm increases considerably.

Yes! It is necessary to start construction of variants from each of the segments of the initial set and continue construction to the beginning and the end.

This also seems to me to be the more correct strategy! However, I think there may be duplicate variants.

Can you help by writing some code?

 
struct SAllVariants{
   
   int m_cut[][2];
   int len;

   int m_cut1[][2]; // к концу
   int len1;
   int li1;
   
   int m_cut2[][2]; // с началу
   int len2;
   int li2;   
   
   bool isDopel;
   
   void Init(int aLen){
      ArrayResize(m_cut1,aLen);
      len1=0;
      ArrayResize(m_cut2,aLen);
      len2=0;      
   }
   void AddCut1(int & a[][2],int i){
      m_cut1[len1][0]=a[i][0];
      m_cut1[len1][1]=a[i][1];
      len1++;
      li1=i;
   }
   void AddCut2(int & a[][2],int i){
      m_cut2[len2][0]=a[i][0];
      m_cut2[len2][1]=a[i][1];
      len2++;
      li2=i;
   }   
   void Set(SAllVariants & a){
      len1=a.len1;
      li1=a.li1;
      for(int i=0;i<len1;i++){
         m_cut1[i][0]=a.m_cut1[i][0];
         m_cut1[i][1]=a.m_cut1[i][1];
      }
      len2=a.len2;
      li2=a.li2;
      for(int i=0;i<len2;i++){
         m_cut2[i][0]=a.m_cut2[i][0];
         m_cut2[i][1]=a.m_cut2[i][1];
      }      
   }
   
   bool Eq(SAllVariants & a){
      if(len1!=a.len1 || len2!=a.len2){
         return(false);
      }
      for(int i=0;i<len1;i++){
         if(m_cut1[i][0]!= a.m_cut1[i][0] || m_cut1[i][1]!= a.m_cut1[i][1]){
            return(false);
         }
      }
      for(int i=0;i<len2;i++){
         if(m_cut2[i][0]!= a.m_cut2[i][0] || m_cut2[i][1]!= a.m_cut2[i][1]){
            return(false);
         }      
      }      
      return(true);
   }
   
   void Combine(){
      len=len1+len2-1;
      ArrayResize(m_cut,len);
      int j=0;
      for(int i=len2-1;i>0;i--){
         m_cut[j][0]=m_cut2[i][0];
         m_cut[j][1]=m_cut2[i][1];         
         j++;
      }
      for(int i=0;i<len1;i++){
         m_cut[j][0]=m_cut1[i][0];
         m_cut[j][1]=m_cut1[i][1]; 
         j++;
      }      
   }
};

SAllVariants av[];

int cut[][2];

void OnStart(){

   Print("=============================================================");
  
   // заполняем массив сотней каких-нибудь отрезков
   FillArray(cut,100);

   //=== алгоритм ====================
   
   // поправить все отрезки, чтобы первая координата была меньше второй   
   int itmp;
   for(int i=0;i<ArrayRange(cut,0);i++){
      if(cut[i][0]>cut[i][1]){
         itmp=cut[i][0];
         cut[i][0]=cut[i][1];
         cut[i][1]=itmp;
      }
   }
   
   // сортировка массива по первой координате
   ArraySort(cut);
   ArrayPrint(cut);
   // удалить отрезки нулевой длины и повтряющиеся отрезки
   bool ex;
   int ti=0;
   for(int i=0;i<ArrayRange(cut,0);i++){
      if(cut[i][0]!=cut[i][1]){
         ex=false;
         for(int j=i-1;j>=0 && cut[j][0]==cut[i][0];j--){
            if(cut[j][0]==cut[i][0] && cut[j][1]==cut[i][1]){
               ex=true;
               break;
            }
         }
         if(!ex){
            cut[ti][0]=cut[i][0];
            cut[ti][1]=cut[i][1];
            ti++;
         }
      }
   }
   ArrayResize(cut,ti);
   
   // добавить каждый отрезок в массив всех вариантов
   ArrayResize(av,ArrayRange(cut,0));
   for(int i=0;i<ArrayRange(cut,0);i++){
      av[i].Init(ArrayRange(cut,0));
      av[i].AddCut1(cut,i); // в массив, идущий к концу
      av[i].AddCut2(cut,i); // в массив, идущий к началу
   }


   // а теперь самое интересное
   
   // к концу
   
   double r;
   bool n;
   for(int i=0;i<ArraySize(av) && !IsStopped();i++){ // для каждого варианта
      // найти ближайшее расстояние до следующего отрезка
      r=DBL_MAX;
      for(int j=av[i].li1+1;j<ArrayRange(cut,0);j++){
         if(cut[j][0]>=av[i].m_cut1[av[i].len1-1][1]){
            r=MathMin(r,cut[j][0]-av[i].m_cut1[av[i].len1-1][1]); // потому что допускаются пропуски (важнее составить ряд, чем чтобы он был непрерывным)
         }
      }
      if(r!=DBL_MAX){
         n=false;
         for(int j=av[i].li1+1;j<ArrayRange(cut,0);j++){
            if(cut[j][0]-av[i].m_cut1[av[i].len1-1][1]==r){
               if(!n){
                  n=true;
                  av[i].AddCut1(cut,j);
               }
               else{
                  ArrayResize(av,ArraySize(av)+1);
                  av[ArraySize(av)-1].Init(ArrayRange(cut,0));
                  av[ArraySize(av)-1].Set(av[i]);
                  av[ArraySize(av)-1].AddCut1(cut,j);
               }
            }
         }
         if(n){
            i--;
         }
      }
   }
   
   // к началу
   
   for(int i=0;i<ArraySize(av) && !IsStopped();i++){ // для каждого варианта
      // найти ближайшее расстояние до следующего отрезка
      r=DBL_MAX;
      for(int j=av[i].li2-1;j>=0;j--){
         if(cut[j][1]<=av[i].m_cut2[av[i].len2-1][0]){
            r=MathMin(r,av[i].m_cut2[av[i].len2-1][0]-cut[j][1]); // потому что допускаются пропуски (важнее составить ряд, чем чтобы он был непрерывным)
         }
      }
      if(r!=DBL_MAX){
         n=false;
         for(int j=av[i].li2-1;j>=0;j--){
            if(av[i].m_cut2[av[i].len2-1][0]-cut[j][1]==r){
               if(!n){
                  n=true;
                  av[i].AddCut2(cut,j);
               }
               else{
                  ArrayResize(av,ArraySize(av)+1);
                  av[ArraySize(av)-1].Init(ArrayRange(cut,0));
                  av[ArraySize(av)-1].Set(av[i]);
                  av[ArraySize(av)-1].AddCut2(cut,j);
               }
            }
         }
         if(n){
            i--;
         }
      }
   } 
   
   // пометить дубли
   
   for(int i=0;i<ArraySize(av);i++){   
      av[i].isDopel=false;
      for(int j=0;j<i;j++){   
         if(av[i].Eq(av[j])){
            av[i].isDopel=true;
            break;
         }
      }
   }
   
   // соединить два массива в 1
   
   for(int i=0;i<ArraySize(av);i++){
      if(!av[i].isDopel){
         av[i].Combine();
      }
   }
   
   // вывод
   
   int h=FileOpen("Выстраивание отрезков.txt",FILE_TXT|FILE_WRITE);
   
   for(int i=0;i<ArrayRange(cut,0);i++){
      FileWrite(h,(string)cut[i][0]+"-"+(string)cut[i][1]);
   }   
   
   FileWrite(h,"");
   
   string str="";
   int vn=0;
   for(int i=0;i<ArraySize(av) && !IsStopped();i++){
      if(!av[i].isDopel){
         str="";
         for(int j=0;j<av[i].len;j++){
            str=str+(string)av[i].m_cut[j][0]+"-"+(string)av[i].m_cut[j][1]+" ";
         }
         Print("Вариант ",vn," - ",str);
         FileWrite(h,"Вариант ",vn," - ",str);
         vn++;
      }
   }
   
   FileClose(h);

}
//+------------------------------------------------------------------+

void FillArray(int & a[][2],int cutCnt){
   ArrayResize(a,cutCnt);
   for(int i=0;i<cutCnt;i++){
      a[i][0]=MathRand()%30;
      a[i][1]=a[i][0]+MathRand()%15;
   }   
}

The duplicates were not sifted from the array, only marked. Since each variant now stores the segments in two arrays, to make it more convenient, they can be combined into one array using theCombine() method.

 
Dmitry Fedoseev:

I didn't sift the duplicates from the array, I only marked them. Since now each variant stores the segments in two arrays, you can combine them into one array using theCombine() method to make it more convenient.

Dmitry, thanks for the new algorithm!

Yes, there are indeed a lot of copies.

2021.04.22 16:55:43.829 Q_Podbor_02 (Si-6.21,M1)        Вариант 0 - 0-1 2-5 7-9 
2021.04.22 16:55:43.829 Q_Podbor_02 (Si-6.21,M1)        Вариант 1 - 0-1 2-5 7-9 
2021.04.22 16:55:43.829 Q_Podbor_02 (Si-6.21,M1)        Вариант 2 - 0-1 3-6 7-9 
2021.04.22 16:55:43.829 Q_Podbor_02 (Si-6.21,M1)        Вариант 3 - 0-1 3-6 7-9 

As I understand it, they can't be counted. I hadn't managed to wait for the combination of 1000 elements - my netbook started to run out of memory :(

And is it possible not to use all combinations when adding a segment, but only a certain number of possible in the current step, say, the best 10?

 
Aleksey Vyazmikin:

Dmitry, thanks for the new algorithm!

Yes, there are indeed a lot of copies.

As I understand it, you can not count them. I couldn't wait for the combination of 1000 elements - my netbook started to run out of memory :(

Is it possible not to use all combinations when adding a segment, but only a certain number of possible in the current step, say, the best 10?

To know that they are the best, you have to compare them with others, i.e. you have to get all of them first. Another thing is to somehow optimize the algorithm, but I don't have a goal to devote my life to this algorithm.)

Maybe decide on the criterion of sufficiency and first get all options, starting from only one segment, randomly chosen, and so on, until a satisfactory option appears.

And the second option can be accelerated - to scale the array with variants not one element at a time, but several dozens of elements at once, and at the end to trim it.

 
Dmitry Fedoseev:

To know that they are the best, you have to compare them with others, i.e. you have to get all of them first. Another thing is to optimise the algorithm somehow, but I don't have the goal of devoting my life to this algorithm.)

I'm talking about a single segment, let's say it has a coefficient to evaluate its quality, then after each iteration we branch, for example, only on the top 10 of these coefficients.

Dmitry Fedoseev:

Maybe decide on a sufficiency criterion and first get all variants, starting from only one segment, randomly chosen and so on, until a satisfactory variant appears.

Unfortunately, "sufficiency" is difficult to estimate here - here it is necessary to know a standard, then from it it is possible to define a tolerance, and I do not have a standard.

Dmitry Fedoseev:

And the second option can be accelerated - to scale array with options not one element at a time, but several dozens of elements, and at the end of it to align.

I'm not quite sure what you mean by paralleling using OpenCL?

 
Aleksey Vyazmikin:

1. I'm talking about a single segment, let's say it has a coefficient to assess its quality, then after each iteration we branch to, for example, only the top 10 of those coefficients.

2. Unfortunately, "sufficiency" is difficult to estimate here - you need to know the benchmark, then you can determine the tolerance from it, and I don't have a benchmark.

3. I'm not quite sure what you mean by paralleling using OpenCL?

1. Where is this coefficient?

2. what about point 1?

3. no, it's simpler. Ok, I'll try to speed it up tomorrow.

Reason: