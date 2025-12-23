Machine learning in trading: theory, models, practice and algo-trading - page 881
I have a question about the target variable.
If our target variable is a financial result of a trade, then it is reasonable to normalize this result, as I thought. But here I'm looking for information on the site, and everywhere it says that the target variable must have two values - buying or selling. And if I will have a loss in any case - buy or sell (and it turned out so!), then why should I cut all negative variables? And if it is the presence of negative options that affects the statistics?
In general, I would like to know what networks work (and where to get them?) in the extreme case with a trigger - buy/sell/never do anything, and in the best case with a function (earlier I asked for a function here since I was looking for a theoretical solution, but now I made a script that summarizes predictors) that does the ranking.
> If our target variable is financial result of a trade, then it's reasonable to normalize this result, as I thought
I don't normalize the targetets, I use them as they are (price increase). If you use neuronics, then it's better to normalize predictors (another name - inputs, chips). For the forest somehow I don't really need to bother with normalization, it will work fine in any case.
Note that neuronics very often have an output that also goes through an activation function, and can only be within (0;1) for a sigmoid. Then the target also needs to be normalized if it does not fall within this interval. Or you can remove the activation for the exit so it may take any value.
> But here I'm looking for information on the site, and everywhere else I'm talking about the fact that the target variable must have two values - buy or sell.
This is called classification. When instead of price, it's just some set (0 and 1; -1 and 1; "buy" and "sell").
When you predict the price itself or its increment, it is not called classification, but regression.
> If I will have a loss in any case - buy or sell (it turned out so!), then why should I cut all of the negative choices? And what if it's the presence of negative options that affects the statistics?
It all depends on your predictors, you can only find out the answer experimentally, by trying both options. For example I tried to create my own fitness function for forest - I built a trade diagram (taking into account the spread) using forest forecasts and used the diagram to define the sharpe ratio, that was the value I tried to increase as a result.
> In general, I would like to know what networks work.
I am currently taking open prices and using them with indicators (homemade) to create new features. I train neuronics which predicts price increase per bar. I need a lot of time to create new indicators, otherwise the model will not beat the spread.
I've looked through your files and see that you already have a lot of predictors. If your targeting is just a set of -1,0,1 - use the forest. If you want to predict the price - use neuronka instead.
I misunderstood... yes, the points where the maximal profits are possible of course.
for the purpose of classification, a multilayer perseptron with a softmax output layer (outputs probabilities of belonging to a class)
Have you read this? https://www.mql5.com/ru/articles/497 on the example of the 1st neuron. And then imagine that there are many of them, that's the whole neural network
There just described the threshold functions you were asking about.
Well, it looks like a good result, yeah.
Missed that article - thanks for pointing it out, something cleared up! But not everything at once - you have to read these things a couple of times... I understand about spreading out the coefficients and listening to their sum with a function.
Tried the 1st file, split it into 3 parts:
Training
Predicted
Actual 0 1
0 28107 1244
1 3045 4119
Test 1
Predicted
Actual 0 1
0 5950 356
1 742 776
Test 2
Predicted
Actual 0 1
0 5945 333
1 779 769
Calculated on nnet with 10 neurons in the hidden layer (NS of Rattle package from R)
Worse than your scaffold, but not bad either. Second file will probably have the same results.
Thanks! I think these results can be used as a filter, i.e. a ban on trading - since guessing zeros is more stable.
Well there are more of them in reality - that's why it's easy to guess)
So it is not important - the main thing is to know in what areas there is an increased risk for trade, which they kind of showed.
The other thing is that I have no idea how to turn all this into the same indicator - is it really necessary to rewrite all the rules that have been formed or what?
The forest is just worse with sparse features, there will be few splits
and if there are a lot of sparse features and one of them is not sparse, then the forest will overfit to it and it will have the biggest imports, and the others will have very little effect on the result
Normal forest or random forest, or both?
I put a Rattle and R (well, and glitches all this stuff ...), and now I can not understand how to make a comparable setting, as in the screenshot below? Because standard Rattle settings gave worse results than the program, which I used before.
Same signs, same settings, but the signs are collapsed, not expanded as before.
Old variant
New variant
I found a little more zeros, but considerably fewer ones - almost 2 times less! I didn't think that collapsed and expanded variables can have such an effect...
normal forest and random forest and tree forests are the same thing :) Forest is an ensemble of Trees
are the traits collapsed, meaning there are less of them or what? by collapsed traits are rarely changing and/or categorical ones like ones and zeros (well, this is a high level of understanding)
I do not use R because I was inculcated with an aversion to it by the local gurus of their golden spectacles
so long you'll bother, study the theory of what is trees and what is a forest of trees
On your data, file Pred_004_Buy divided in half, in head-on you can get 0.85.
The data is rubbish and better thrown out. The rest is catching up on your own. In silence...
Why does the data suck? It's a very good result.