Machine learning in trading: theory, models, practice and algo-trading - page 3371
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
You're misunderstanding the stove. It looks like you have never looked at the tree building code.... There are no operations within a single row there at all!!!, only with sets (full or batches).
In brief:A random/full set of rows passed to training, is sorted one by one for each predictor/column. Different splits on it are checked (middle/percentile/random), statistics for each are counted, and the best split is selected for the whole set of rows, not for one/each row as you suggested.
According to the best split, the set of strings is divided into 2 sets, then each set is sorted again and the best split is selected for each of the parts, etc. until the stopping rule is reached (by depth, number of examples per line, etc.)
You can see more details in the editor, you have the file:
\MQL5\Include\Math\Alglib\dataanalysis.mqh
ClassifierSplit() function and the one from which it is called.
You will understand it in a couple of hours and you won't have to talk about searching predictors by one line.
1. RegressionTree() class
You're misunderstanding the stove. It looks like you have never looked at the tree building code.... There are no operations within a single row there at all!!!, only with sets (full or batches).
In brief:A random/full set of rows passed to training, is sorted one by one for each predictor/column. Different splits on it are checked (middle/percentile/random), statistics for each are counted, and the best split is selected for the whole set of rows, not for one/each row as you suggested.
According to the best split, the set of strings is divided into 2 sets, then each set is sorted again and the best split is selected for each of the parts, etc. until the stopping rule is reached (by depth, number of examples per line, etc.)
You can see more details in the editor, you have the file:
\MQL5\Include\Math\Alglib\dataanalysis.mqh
ClassifierSplit() function and the one from which it is called.
You will understand it in a couple of hours and you won't have to talk about searching predictors by one line.
You are right about many lines.
Let's go back to the beginning: what is a pattern in a random forest?
It is a single tree. Here is an example of one such tree from RF:
Total rows = 166+185! All of them didn't fit
There are 150 such trees in my model
You're right about the many lines.
Back to the beginning: what is a pattern in a random forest?
It is a single tree. Here is an example of one such tree from RF:
Total rows = 166+185! None of them fit
There are 150 such trees in my model
Consider again the path forming the leaf. In my example above there are 5 splits. Isn't this a description of the pattern of 2 vertices with a trough? Description.
7 splits can describe head shoulders etc.
Each leaf of one tree describes a different pattern.
The forest is the opinion of the crowd (derviews).
The 1st tree says: this string falls into my 18th pattern/leaf and answer = 1
2nd: the same string falls into my 215 pattern/leaf and gives answer=0
3rd: = 1
...
We average and get the average opinion of 150 trees. For example = 0.78. Each had a different activated leaf/pattern.
Consider again the path forming sheet. In my example above there are 5 splits. Isn't that a description of the pattern of 2 tops with a trough? Description.
7 splits can describe head shoulders etc.
Each leaf of a single tree describes a different pattern.
A forest is the opinion of a crowd of dervids.
The 1st tree says: this line falls into my 18th pattern/leaf and answer = 1
2nd: the same line falls into my 215 pattern/leaf and gives answer=0
3rd: = 1
...
We average and get the average opinion of 150 trees. Each had a different activated leaf/pattern.
We don't know how many leaves.
The number of trees is a parameter that can be changed to obtain the minimum sample size for training.
We see that 50 trees are enough, so it is convenient to consider a tree as a pattern.
How many leaves is unknown.
The number of trees is a parameter that can be changed to obtain the minimum sample size for training.
We see that 50 trees are enough, so it is convenient to consider a tree as a pattern.
The tree responds to each situation/line with one leaf/pattern. In other situations the response will be from other leaves/patterns.
It seems that not only the leaf, but also the tree doesn't solve anything.
Here I found the formula for the final classifier
Where
It is also worth noting that for the classification task we choose the solution by majority voting, while in the regression task we choose the solution by mean.
It seems that not only the leaf, but also the tree doesn't solve anything.
Here is the formula for the final classifier
It is also worth noting that for the classification task we choose the solution by majority voting, while in the regression task we choose the solution by average.
Why doesn't it solve? It contributes (1/150) to the final answer.
From each tree one of the activated leaves/patterns participates in the voting (average).
The answer of the forest is the average of the answers of all trees (or activated leaves/patterns) - this formula counts it. The majority for binary classification will be if the average is >0.5, then 1, otherwise 0.
But the 0.5 border is probably not the best option, if the package gives access to the value of the average, you can experiment with different borders.
The tree responds to each situation/line with one leaf/pattern. In other situations the response will be from other leaves/patterns.
It seems that not only leaf, but also tree doesn't solve anything.
Not just one leaf but all trees are responsible for each situation, just not all of them are activated, the sum of forecasts of those that are activated is the forecast from the model....
What the hell are you talking about, tree model experts?
Not one leaf, but all trees are responsible for each situation, just not all of them are activated, the sum of the forecasts of those that are activated is the forecast from the model.
What the hell are you talking about, tree model experts?
Do you have any experience using LigthGBM?