Discussion of article "Genetic Algorithms - It's Easy!" - page 5

 
shurick:

Thank you very much for the clarification, the question on removing duplicates is fully answered to my satisfaction. I am attaching the script code demonstrating the original and optimised function, which demonstrates the reduction in the number of passes in loops. In my current comment I am NOT pointing out a bug in the function, but suggesting its optimisation, and my main goal was to find out the principle of duplicate removal, to which I received a comprehensive answer. Once again, thank you very much for the library and for the clarification of the function.

110 and 160 are too much for 20 chromosomes.... You put the counter in the wrong place.

That's the right place:

//Select the second of the pair....
      for (Ch2=Ch+1;Ch2<PopulChromosCount;Ch2++)
      {
        //count_cicles++; // DEBUG count how many times we pass cycles Ch and Ch2
        if (Ch!=Ch2 && chromosomeUnique[Ch2]!=0)
        {
          count_cicles++; // DEBUG counts how many times we compare chromosomes.
          //Let's reset the count of the number of identical genes
          cnt=0;
          //Check the genes, as long as the genes are the same.
          for (Ge=1;Ge<=GeneCount;Ge++)
          {
            if (Population[Ge][Ch]!=Population[Ge][Ch2])
              break;
            else
              cnt++;
          }
          //If the number of identical genes is the same as the total number of genes
          //..the chromosome is recognised as a duplicate.
          if (cnt==GeneCount)
            chromosomeUnique[Ch2]=0;
        }

OK. Now try the same population, with the same chromosomes, but in this sequence:

int m_init[20] = {7,7,7,3,9,2,4,5,3,3,5,6,2,4,3,5,10,6,2};

What do we observe?

 

to shurick:

Indeed. The number of uniqueness checks, with all different chromosomes, can be calculated by the formula

(PopulChromosCount^2-PopulChromosCount)/2

In our example, which has 20 chromosomes (assuming all chromosomes are different), the number of checks would be:

(20*20-20)/2=190

this is confirmed by this check:

int  m_init[20]  = {1,2,3,4,5,6,6,7,8,9,9,10,11,12,13,14,15,16,17,18,19,20};

If duplicates are caught, there will be even fewer checks.

Thank you for your active participation in the project!

Appropriate changes will be made to the library. Although the changes will not affect the search capabilities of the algorithm, they will make its work more rational.

 
joo:

What are we looking at?

Result

Although the changes will not affect the search capabilities of the algorithm in any way

Absolutely right, the quality of the algorithm will not change, although initially I doubted it, but now you have proved it to me, but on these doubts found a method of optimisation :)

 
Updated UGA library and examples to the article. Current free author's version 1.2.1.
Files:
 
Update Library UGA and examples for this article. Current free authoring version 1.2.1.
Files:
 

I studied the article and in the code found such an inconsistency, in two functions for obtaining range values the same formula is used:

//Replication
void Replication
...
    //Let's set the boundaries for creating a new gene
    Minimum = C1-((C2-C1)*ReplicationOffset);
    Maximum = C2+((C2-C1)*ReplicationOffset);
...

/////////////////////////////////////////////

// Artificial Mutation.
void ArtificialMutation
...
    //Let's set the boundaries for creating a new gene
    Minimum=C1-((C2-C1)*ReplicationOffset);
    Maximum=C2+((C2-C1)*ReplicationOffset);
...

I.e. data outside the values of two genes are obtained, which is characteristic of "artificial mutation". Is this a bug or is there another explanation?

It seems to me that for the replication method it is necessary to change the signs:

Minimum = C1+((C2-C1)*ReplicationOffset);

Maximum = C2-((C2-C1)*ReplicationOffset);


 
Batohov:

I studied the article and found this inconsistency in the code, two functions use the same formula to get range values:

I.e. data outside the values of two genes are obtained, which is characteristic of "artificial mutation". Is this a bug or is there another explanation?

It seems to me that for the replication method it is necessary to change the signs:

Minimum = C1+((C2-C1)*ReplicationOffset);

Maximum = C2-((C2-C1)*ReplicationOffset);

There is no error or inconsistency. Both in replication and in artificial mutation the boundaries of probable appearance of a new gene (the code sections you cited) are defined in the same way.

But for replication the area of probability lies within these boundaries, and for artificial mutation - beyond these boundaries.

Replication serves to transfer traits characteristic of the genes of both parents (within the boundaries).

Artificial mutation serves to generate new genes different from those of the parents (outside the boundaries).

 
joo:

There is no error or inconsistency. Both in replication and in artificial mutation, the boundaries of probable appearance of a new gene (the code sections you cited) are defined in the same way.

But for replication the area of probability lies within these boundaries, and for artificial mutation - beyond these boundaries.

Replication serves to transfer traits characteristic of the genes of both parents (within the boundaries).

Artificial mutation serves to generate new genes different from those of the parents (outside the boundaries).

Thank you for your quick reply, I understand your train of thought.
 
joo:

By the way, it is very nice to be the destroyer of one of the most famous trader myths related to ZZ (the second task in the article). :)

Apparently, I did not understand the wording of the task. My statement:

Entry points for maximum profit with the condition that the min. trade is N pips, are located on the tops of the ZigZag with the condition of min. knee N + Spread pips.

 
joo:

I have posted some interesting test functions in the MQL4 forum thread "Test Multivariable Multiextremal Function", one of them is presented in the article.

If you want, you can try to find extrema of the proposed functions using other optimisation algorithms other than GA and post the results here. You are welcome to do so. It will be interesting for everyone and for me in the first place.

I realise that it is very important to evaluate the validity of the fit. One of the methods is adding noise to the original data.

The source codes for alternative optimisation methods can be found here(http://alglib.sources.ru/optimization/) and here(http://ool.sourceforge.net/).

Obviously, each optimisation algorithm performs better on its own classes of target functions.

Which target functions do you use in practice?