Machine learning in trading: theory, models, practice and algo-trading - page 1072

 
Maxim Dmitrievsky:

Yes its just my transformation function with cos and random degree, not a kernel

RDF save own structure if file, so if too many samples or features - files can be too large, Im not sure about million agents :) But you can decrease classification error and increase model stability by increasing agents number with different features

Next moment - in my experience, larger data != better accuracy on new data. If predictors are bad, the result will the same bad

Also gdmh can work with extremely small datasets, and approximate it good for future data

But... we can apply a simple model like logistic regression + gdmh (instead of RDF), so learning process will be very fast, but dont know about the accuracy

Regarding GMDH, that is exactly what I informed to you when you first mentioned regarding GMDH. Because GMDH itself acts as a Neural network and hence, there is no use of RDF here.

 
Maxim Dmitrievsky:

yes, there are differences. To make a decision with RDF it must go through all nodes and leaves. So if RDF structure is large - it will take more time for every decision

For this way better to use extremely fast models (fast learned NN) with fast response. Or better hardware, including GPU

One more problem - RDF is sensitive to noise in data, so almost always overfitted. To reduce this affect of noise, good to embed LDA or PCA layer to algorithm

so this is not such a trivial task as it might seem at first glance

You mean your current implementation of RDF and your previous version of RDF are completely different? Does this RDF version uses policy or something else? I am not sure about the ".rl" files. I thought those are similar to "Mtrees" files of your previous version.

Let me see what is happening with training and testing. I noticed that the models and agents don't seem to run properly to very large value. For example, I just set the agents to 100 and models to 100. The agents worked, but the models stopped working at 50 for each agent. I don't know for what reason.

By the way, I am just testing my algo with RDF since you have already implemented the basic code, because I have done thousands and thousands different combinations of optimization and testing in your previous version and hence, I have a through understanding of your previous version of RDF. Otherwise, I have to write the complete code for "Monte carlo" algo used in "ALPHA ZERO" to integrate to my algo which may take quite some time for me since I am not an expert programmer.

 
Maxim Dmitrievsky:

Hi, try to figure out this code, for starters:

Here we learn RDF with every 1 feature (1 input) and save best features numbers in a sorted array (models). Next we can choose the few best

Next tep - we must combine every best predictors whith others and again check an errors, I think, and save the result. On this step we can apply some polynomial equations

Ok, let me see now how to implement GMDH with this code.

The more you can explain the code, the quicker I can find a bridge. Actually, my problem is I am little weak in syntax of some basic C++ concepts like class, objects, arrays etc and hence, I take more time to understand those concepts implementation, otherwise, I would have directly written the class file of GMDH and given to you.

Anyway, let me see.

Please explain these 3 lines to me properly. I think here is the thing where we need to apply GMDH:

        m[i].Set(0,RDFpolicyMatrix[i][bf]); 
        m[i].Set(1,RDFpolicyMatrix[i][bf+1]);
        m[i].Set(2,RDFpolicyMatrix[i][bf+2]);

I mean please comment these 3 lines

I think in my previous code I did some minor mistake in the loop. So I think here is the bridge provided if you know exactly what you have written about RDF:))... because I don't know much about the matrix implementation...

 ///---Remaining values of Y starting from Y_1
  for(int i=1;i<3;i++)
  m[i]=CalculateNeuron(a);///---Calculate individual value of Y_1,Y_2,Y_3,...to feed to RDF inputs
 
 
Maxim Dmitrievsky:

this is 2d array (matrix), "alglib" library format, we just fill it with number of predictor(bf) (index 0 of matrix), next (1,2) is otput values... error here :) we need to set values with indexes "features" and "features-1"

m - our currsent matrix wit 1 features and 2 outputs, butRDFpolicyMatrix contain all features + otputs

fixed

you can read about here http://www.alglib.net/dataanalysis/generalprinciples.php#header0

Ok, so there is some misunderstanding...let me check again the code as to how to link this to GMDH...

If you get the idea, just update me so that I don't have to waste my time in thinking :))

I think RDF and GMDH are similar and hence, it is becoming difficult to integrate with each other...

Let me think again....

 
Maxim Dmitrievsky:

no, it's easy to integrate . we change only input vectors with gdmh, just some transformations

on the next step we will check the groups of predictors, combine to which other (only a few from the previous selection)

Then, this loop can do it all for feature transformation which you are referring to:

Next, this is the function to calculate Neuron:

Next, Y_Final=Y_All+Y_0;

We have now broken the inputs into 3 pieces and we can expand it to any number if required...

Here the inputs are the features or predictore and weights are random weights...we can take those from random function initially and later on those will be stored inside RDF after training

 
Maxim Dmitrievsky:

try to remade it for matrix now ))

but... we dont need a summ here, we need separate predictors on each step, just expand our matrix for additional features and add them, and check errors again

ok, let me some time and soon be ready

Ok, no problem here then, just remove the "+" sign for separate predictors and you will get individual predictors:

/// --- Remaining values ​​of Y starting from Y_1 
  for ( int a = 1 ; a < 3 ; a ++) 
  Y_All [a]=CalculateNeuron (a); /// --- Calculate the individual value of Y_1, Y_2, Y_3, ... to feed to RDF inputs...Here feature transformation done

But if you have better ways that is also great:))...Because this will be very slow due to multiple for loops...So a very gross implementation:))

I wrote this code just in one hour and so I even don't like this code:))

Matrix doesn't enter my brain:)))))))))))))))))))))))))))))))))
 
Maxim Dmitrievsky:

hehe... how about sorting 2d matrix? :)

Also, I will request you to implement at least a few more varieties of predictors in the final version of the EA:

1.Some oscilator indicators

2.Some trend indicators

3.Some volume indicator

4.Some indicators from higher timeframes(Good filter for noisy signals)

5.Some direct close prices

Otherwise, I have to keep on asking you when I have to add something :)))...

If it can instantly switch from one system to another system one every candle based on market change, then it will be really great!... In fact that was the original goal what I expected from this version when you mentioned regarding spline, feature transformations and kernel trick etc... Kernel trick would have helped to make the calculation much faster even in a regular server for large data also...So now we have to rely only on perfectly trained models and fast transformation...

By the way, from past 2 days of forward testing I have to say that this version of RDF seems little more stable and reliable as compared to previous version of RDF...I mean forward testing matches somewhat with backtesting...but the previous version was mostly over-fitting with optimization data

 
Maxim Dmitrievsky:

previous version it was just a concept with basis ideas

you can add an indicator(s) itself, instead of close prices

No, not one indicator...I mean I am confused when I apply multiple indicators simultaneously in the array loop of "ag1.agent"

what should I use in place of"ArraySize(ag1.agent)" when I use total 100 Features, but 50 for close and 50 for high

for(int i=0;i<ArraySize(ag1.agent);i++)///---here
     {   
      CopyClose(_Symbol,0,0,50,ag1.agent[i].inpVector);
      normalizeArrays(ag1.agent[i].inpVector);
     }
for(int i=0;i<ArraySize(ag1.agent);i++)///---here
     {   
      CopyHigh(_Symbol,0,0,50,ag1.agent[i].inpVector);
      normalizeArrays(ag1.agent[i].inpVector);
     }

So is the above code correct for declaration of agent like this below?

CRLAgents *ag1=new CRLAgents("RlExp1iter",5,100,50,regularize,learn);///----Here

Well, I was the first person who commented on your English forum:)))....from that day I have run at least more than 20,000 different tests and optimisations throughout my all servers and all types of combinations of settings and hence, I have a better understanding of the overall concept...but my main problem is I even get stuck with simple code....

And I can promise you that even if this algo can just start to converge somewhat over time, then I can optimise the code to perform at least 2 to 3 times better than what you will publish :))...all just from my experience and observations :)).

Or this below code is correct?

Or

for(int i=0;i<50;i++)///---here
     {   
      CopyClose(_Symbol,0,0,50,ag1.agent[i].inpVector);
      normalizeArrays(ag1.agent[i].inpVector);
     }
for(int i=50;i<100;i++)///---here
     {   
      CopyHigh(_Symbol,0,0,50,ag1.agent[i].inpVector);
      normalizeArrays(ag1.agent[i].inpVector);
     }
 
Maxim Dmitrievsky:

Thank you very much...Now, I think I can do more indicators:))))))))

 
Maxim Dmitrievsky:

Or like here, more simple

Yes, I like this...This is of my type :))

Previous example is very much conditional and can not increase more predictors ....

By the way, I used this method for random candle simulations...but I have to change the trade entry and exit prices also for training and hence, I got confused...

For now I will try these indicator methods...and test it and later on I will try the candle simulation method...since if it will be successful then that will be the last version of any machine learning EA ever created in forex market:))))

Reason: