Машинное обучение в трейдинге: теория, модели, практика и алготорговля - страница 1071

 
мде.. печальное программирование
 
Maxim Dmitrievsky :
mde .. sad programming

You mean the above code is not correct or are you referring to the GMDH code implementation? GMGH code was just a rough implementation which I found quickly. But I can not improve it for now. I think there can be better ways to code it which I will try also.

In your current source code, there are many agents or variables with similar names and multiple dot operators like " ag1.agent [i] .inpVector " which is a confusion for me until I go through the entire code ...

 
FxTrader562:

You mean the above code is not correct or are you referring to the GMDH code implementation? GMGH code was just a rough implementation which I found quickly..But I think there can be better ways to do it..but I don't for now

In your current source code, there are many agents or variables with similar names and multiple dot operators like " ag1.agent [i] .inpVector " which is a confusion for me until I go through the entire code ...

I explained it before. You have 100 predictors, so you need to copy in this array 25 different close prices, 25 different open, 25 high, 25 low. Total 100

or make 4 predictors arrayy, and copy 1 close, 1 open,..... or any another different indicators values

Or wait an article, because forum communications takes too much time (for me)

About gdmh - there are a lot of different implementations, no problems with it. But for my task need additional research

 
Maxim Dmitrievsky:

I explained it before. You have 100 predictors, so you need to copy in this array 25 different close prices, 25 different open, 25 high, 25 low. Total 100

or make 4 predictors arrayy, and copy 1 close, 1 open,..... or any another different indicators values

Or wait an article, because forum communications takes too much time (for me)

About gdmh - there are a lot of different implementations, no problems with it. But for my task need additional research

Yes, I understand. 

Also, I have your source code and hence, it will take sometime for me to understand and change it since there is not much commenting in it. Also, I am not an expert level programmer. So I was just trying to fast forward the process with your help :))

Anyway, I will wait for your article with detailed explanation.

Yes, regarding GMDH as I mentioned before there are multiple approaches and multiple formulas and hence, accordingly you need to choose one which one is applicable for RDF implementation. I simply translated or converted the general formula of GMDH from the wikipedia link to MQL5 code which you provided to me earlier.

Also, I have given sufficient explanation inside the code to make it understandable. I tried to look into multiple python code already before I wrote the MQL5 code, but nothing satisfied my need. That is why I wrote the simplest way of GMDH using a function and switch case statement.

 
FxTrader562:

Yes, I understand. 

Also, I have your source code and hence, it will take sometime for me to understand and change it since there is not much commenting in it. Also, I am not an expert level programmer. So I was just trying to fast forward the process with your help :))

Anyway, I will wait for your article with detailed explanation.

Yes, regarding GMDH as I mentioned before there are multiple approaches and multiple formulas and hence, accordingly you need to choose one which one is applicable for RDF implementation. I simply translated or converted the general formula of GMDH from the wikipedia link to MQL5 code which you provided to me earlier.

Also, I have given sufficient explanation inside the code to make it understandable. I tried to look into multiple python code already before I wrote the MQL5 code, but nothing satisfied my need. That is why I wrote the simplest way of GMDH using a function and switch case statement.

There are described a linear case. For example, we are seeking for the best members for 1-st line, 2-nd, etc.. And then add best members to one (new) variable from formula. In RDF case we can't do so, because it is nonlinear model, so we must just add all lines as independend variables to inputs, on every lap of selection

 
Maxim Dmitrievsky:

There are described a linear case. For example, we are seeking for the best members for 1-st line, 2-nd, etc.. And then add best members to one (new) variable from formula. In RDF case we can't do so, because it is nonlinear model, so we must just add all lines as independend variables to inputs, on every lap of selection

Can you please give me a sample code exactly where you are trying to implement so that I can help you better?

Are you talking about weights like w1,w2, w3....? Those are the things which must be calculated and stored inside RDF while training when we feed x1.w1,x2.w2,x3.w3.... as individual inputs.

Note that in reality when considering functions, there is no such thing as linear and non-linear functions, because you can always break a non-linear function into infinite linear functions.So I don't consider complicating things for no reason. That's why we are using small pieces of linear functions as input and we can expand it to any number if required. But I can't say much about coding part for now.

Please provide me the code of RDF where you are stuck so that I can better understand.

Or referring to my code you can explain what else are you looking for.Because in my understanding I have converted the GMDH formula to code. So if it is required, then we can bring randomness to it or simply expand the base function components to any number and choose randomly.

 
FxTrader562:

Can you please give me a sample code exactly where you are trying to implement so that I can help you better?

Are you talking about weights like w1,w2, w3....? Those are the things which must be calculated and stored inside RDF while training when we feed x1.w1,x2.w2,x3.w3.... as individual inputs.

Note that in reality when considering functions, there is no such thing as linear and non-linear functions, because you can always break a non-linear function into infinite linear functions.So I don't consider complicating things for no reason. That's why we are using small pieces of linear functions as input and we can expand it to any number if required. But I can't say much about coding part for now.

Please provide me the code of RDF where you are stuck so that I can better understand.

Or referring to my code you can explain what else are you looking for.Because in my understanding I have converted the GMDH formula to code. So if it is required, then we can bring randomness to it or simply expand the base function components to any number and choose randomly.

to find linear coefficients we must use linear regression, but can also do it directly with RF. I just reading an interesting book now about it

This book develops different polynomials in the form of tree-structured networks, including algebraic network polynomials, kernel network polynomials, orthogonal network polynomials, trigonometric network polynomials, rational network polynomials, and dynamic network polynomials. 
 
Maxim Dmitrievsky:

to find linear coefficients we must use linear regression, but can also do it directly with RF. I just reading an interesting book now about it

Ok, if you want exact implementation of GMDH and we need to expand my code further with higher value of m, then I can do it. But what I want you to do is first implement it with m value of 3 and test it in M1 timeframe, because the candle close prices in M1 timeframe are very small and hence, breaking it into 3 parts is enough to achieve our goal.

So in this way, if you can implement GMDH and it works somehow in M1 timeframe, then for higher timeframes we can expand the for loop to bigger values of m like 10 or 15 etc.

By the way, in your current implementation of code have you used kernel trick when you used the following code?

#define _kernel(ker,degree) (cos(MathPow(ker,degree)))

I completely researched, but could not find as which kernel function you have used in your code because I wanted to test different kernels before implementing my algo:

http://crsouza.com/2010/03/17/kernel-functions-for-machine-learning-applications/

Can you provide me some reference as which kernel function is this or is this just your formula?

Also, I have successfully implemented candle simulation algorithm to your current code. But it may work only after a week period of simulations. So I just wanted to confirm few things before starting the simulations.

1.Is there a limit on how many models I use. Can I use 1 million models?

2.Is there a limit on how many agents I use. Can I use 1000 agents?

3.If the trained data will be too large, then will it slow down while making trading decisions? I mean if the ".rl" file will be too large, then approximately how much trade execution time delay we can expect? Have you done any calculations on that in terms of number of iterations per second or for loop etc?

4.Basically, I am using your models and agents to create random candles during training which will be store in ".rl" files for LIVE trading. The ".rl" files in this version are similar to "Mtrees" files of your previous version.Am I correct?

I am planning to run a simulation of around 10 to 100 million candles total which is equivalent to around 280 years to 2800 years of optimisation. But before I run the training I wanted to check multiple things about the code and fastness of execution etc.

When you have free time, kindly go through my above questions and provide me some answers which can help me to calculate further....

 
Maxim Dmitrievsky:

Yes its just my transformation function with cos and random degree, not a kernel

RDF save own structure if file, so if too many samples or features - files can be too large, Im not sure about million agents :) But you can decrease classification error and increase model stability by increasing agents number with different features

Next moment - in my experience, larger data != better accuracy on new data. If predictors are bad, the result will the same bad

also gdmh can work with extremely small datasets, and approximate it good for future data

But.. we can apply a simple model like logistic regression + gdmh (instead RDF), so learning process will be very fast, but dont know about the accuracy

Regarding GMDH, that is exactly what I informed to you when you first mentioned regarding GMDH. Because GMDH itself acts as a Neural network and hence, there is no use of RDF here.

 
Maxim Dmitrievsky:

yes, there are difference. To make decision with RDF it must go through all nodes and leaves. So if RDF structure is large - it will takes more time for every decision

For this way better to use extremely fast models (fast learned NN) with fast response. Or better hardware, including GPU

One more problem - RDF is sensitive to noise in data, so almost always overfitted. To reduce this affect of noise, good to embed LDA or PCA layer to algorithm

so this is not such a trivial task as it might seem at first glance

You mean your current implementation of RDF and your previous version of RDF are completely different? Does this RDF version uses policy or something else? I am not sure about the ".rl" files. I thought those are similar to "Mtrees" files of your previous version.

Let me see what is happening with training and testing. I noticed that the models and agents don't seems to run properly to very large value. For example, I just set the agents to 100 and models to 100. The agents worked, but the models stopped working at 50 for each agent. I don't know for what reason.

By the way, I am just testing my algo with RDF since you have already implemented the basic code, because I have done thousands and thousands different combinations of optimization and testing in your previous version and hence, I have a through understanding of your previous version of RDF . Otherwise, I have to write the complete code for "Monte carlo" algo used in "ALPHA ZERO" to integrate to my algo which may take quite some time for me since I am not an expert programmer. 

Причина обращения: