Machine learning in trading: theory, models, practice and algo-trading - page 818

 
Dr. Trader:

To do this, it is better to use a forest, this model will create such a set of rules:


1: DeltaLess350 <= 0.5
2: ZZ_D <= 0
Decision 0

1: DeltaLess350 <= 0.5
2: ZZ_D > 0
3: DeltaMore350 <= 0.5
Decision 0

1: DeltaLess350 <= 0.5
2: ZZ_D > 0
3: DeltaMore350 > 0.5
Decision 1

DeltaLess350 > 0.5
Decision 1


There is a description in the article on how to do this in R:
https://www.mql5.com/ru/articles/1165

In the tab "Model" select the forest, set the number of trees in the settings = 1, create a model, and then clicking on the Rules button you will see this list of rules

Thanks for the answer. Maybe I'm reading it wrong, but it seems that the model is not correct from Excel this formula for the first (and subsequent lines) of the line with the data "= IF(AND(B2=1;D2=1);AND(C2=1;D2=1));1;0)"

I.e., it's better to use scaffolding for clear logic? Just like for feature extraction, right?

Thanks for the article - I'll study it again - I remember reading it a long time ago, but now I have a little more knowledge, maybe it will be clearer.

Added: The picture is interesting, but clearly the logic is not guessed... Or is it?
 

I rewrote the post a bit, using a tree instead of a forest, so it was more convenient. The formula looks different in both cases, but the answer is still correct.

If the answer can be found by the right combination of values in the input data, the forest works well, yes. All sorts of operations like addition and multiplication don't work.

 
Aleksey Vyazmikin:

it seems that the model is not correct from excel such formula for the first (and subsequent lines) row with the data "=If(OR(AND(B2=1;D2=1);AND(C2=1;D2=1));1;0)"

I don't know. Checked in excel - the forest formula was wrong a couple of times only. The tree formula coincided in all cases.

 
Dr. Trader:

I rewrote the post a bit, using a tree instead of a forest, so it was more convenient. The formula looks different in both cases, but the answer is still correct.

If the answer can be found by the right combination of values in the input data, the forest works well, yes. All sorts of operations like addition and multiplication don't work for the forest.

Okay, so the result it produced, but is there a way to reduce the condition to a single line, or function, which would be corrected and then could be applied promptly in the code? Or do I have to sit down and describe each turn of logic myself, and do not make a mistake with the interpretation of the result? It's just that if the incoming set of values is measured in dozens, it's an extremely time-consuming process...

 
Dr. Trader:

I don't know. I checked in Excel - the forest formula was wrong only a couple of times. The tree formula matched in all cases.

Yes, the formula from the tree went by exclusion, judging by the code, which turned out to be the right solution.

 

Everything is even simpler, there is no need to bother with formulas, there after training you get the usual R model with which you can make predictions on new data.

All the prediction is done with one function

predict(model, newdata)

model - this is a previously created model (tree, forest, neuron, etc., in R there are hundreds of different models). newdata is a table with new data for prediction

 
Dr. Trader:

Everything is even simpler, there is no need to bother with formulas, there after training you get the usual R model with which you can make predictions on new data.

All the prediction is done with one function

model - this is a previously created model (tree, forest, neuron, etc., in R there are hundreds of different models). newdata - table with new data for prediction.

It's interesting, but it's not clear yet - I didn't work with R before... I should do it, so I could get to the bottom of it.

Here's another question, if input data has erroneous data, which is purely random, will forest/tree be able to detect it? Is it possible to organize automatic column-by-column disabling of input data in terms of going through them, including looking for left-handed data in order to exclude them?

 

The most ordinary forest will simply find a minimum set of data with which to define a target. But it will not analyze noise and errors, even using them if they help improve the accuracy of the prediction. This is why, for example, we cannot just take a bunch of indicators for forex and try to predict trends.

There are different advanced modifications of the forest, they have attempts to weed out noise and errors, and everything you wrote. The gbm, xgboost packages in R for example. They work well in general, but they are weak for forex, you need other tricks.

 
Aleksey Vyazmikin:

Can you tell me which neural network algorithm can be used to identify the logic (neuron) of the "Calc" column? tip number one: if you can do without machine learning, then do it :)

1: decide on the sample size, your sample is very small

2. go through simple (linear) classification/regression models, most likely they will work for you, and if the error is big, you can try to go to more complex (non-linear) models.

3) Never give any advice, especially about using R :))) It's time to banish them from this forum.

4. If the problem is solvable without machine learning, it is better not to use it.

 
Dr. Trader:

The most ordinary forest will simply find a minimum set of data with which to define a target. But it will not analyze noise and errors, even using them if they help improve the accuracy of the prediction. This is why, for example, we cannot just take a bunch of indicators for forex and try to predict trends.

There are different advanced modifications of the forest, they have attempts to weed out noise and errors, and everything you wrote. The gbm, xgboost packages in R for example. They work well in general, but they are weak for forex, you need other tricks here.

I don't want to feed the indicators values, my goal is to feed logically described observations (the observed situation or not - it may be news and ratio of different indicators in space relative to each other - in general it is a compressed information, which I use in real trading for decision making), previously built based on these indicators, and try to eliminate false, ie input data will be 0 and 1, or slightly more digits on one input (here is the question, if I want to see the effect of weekdays, then I better make different input pairs

Has anyone ever done a comparison of different algorithms for efficiency, well, if the answer is known, as in my example, but for more complex tasks?

Reason: