Machine learning in trading: theory, models, practice and algo-trading - page 3629

 
By the way, my approach didn't work on his data, but it's getting results on mine.
 

This is another tricky function that sometimes helps to flatten the balance curve on new data.

In training, old data is not as important as the newest data. Because time series, patterns change and all that. Most models support sample weights. You can give more weight to recent observations and less weight to older ones.

def create_sample_weights(X):
    # Разделяем датасет на 10 частей
    num_parts = 10
    part_size = len(X) // num_parts
    weights = np.zeros(len(X))

    for i in range(num_parts):
        start_idx = i * part_size
        end_idx = (i + 1) * part_size if i != num_parts - 1 else len(X)
        weights[start_idx:end_idx] = (i + 1) / num_parts

    return weights
 

I wonder if anyone has managed to use standard approaches to feature selection for trading purposes.

I tried to use lasso and multicollinearity control to find FA signs - I did not find any significant ones at all.


 
Aleksey Nikolayev #:

I wonder if anyone has ever been able to use standard approaches to feature selection to their advantage in trading.

I tried to use lasso and multicollinearity control to find FA signs - I did not find any significant ones at all.


I was not successful :)
They start to interact differently when you remove a part of them and you have to start all over again.
It would be interesting to hear your opinion here https://www.mql5.com/ru/forum/476323/page3#comment_55120093

 
Aleksey Nikolayev #:
I didn't find any significant ones at all.

Maybe it's the target? What is it?

And predictors are natural values of economic indicators, or their returns, or deviations from the forecast?

 
Aleksey Vyazmikin #:

Maybe it's the target? What is it?

And predictors are natural values of economic indicators, or their returns, or there deviations from the forecast?

There were several variants of targets: logreturns, difference between the middle of bars, volatility. There were also variants with classification (continuation of movement/ reversal), etc.

The signs were taken from the local calendar.

But the essence of my question is different. This is a mini-questionnaire - has anyone managed to benefit from the standard MoD ways of sifting signs?

 
Aleksey Nikolayev #:
There were several variants of targets: logreturns, difference between the middle of bars, volatility. There were also variants with classification (continuation of movement/ reversal), etc.

I.e. the targets are about the impact of the news on the price in the next hours after its release?

Aleksey Nikolayev #:
The signs were taken from the local calendar.

I think they should be processed additionally. Perhaps some indicators have natural trends....

Aleksey Nikolayev #:

But the essence of my question is different. This is a mini-questionnaire - has anyone managed to benefit from the standard methods of feature screening for MO?

And what is usefulness?

Usually there are two goals:

1. To reduce the number of features.

2. To improve the results of learning on new data

The first goal is achieved by various standard methods. Earlier on the forum I already showed such examples, when a comparable model with a hundred predictors was obtained on a couple of predictors.

The second goal is also achievable, but it is about randomness, and since the market is volatile, the results can float away on new data.


I think that in economic indicators we should first understand whether the value of the indicator is positive for the rate of a particular currency or not globally. Then understand why traders react to positive news for the economy as negative.

 

I threw out the most rubbish features in terms of "importance", and the results were worse without them than with them.

I got the impression that feature selection methods work either on small dimensions only, or on series with strong patterns only.

or do not work at all :)

 
Aleksey Vyazmikin #:
I.e. the target about the impact of the news on the price in the hours following its release?

No, classical macroeconomic models in the spirit of BEER (I mentioned in my thread about FA) - FA influence on large (I have monthly) timeframes.

Aleksey Vyazmikin #:
I think they should be processed additionally. Perhaps some indicators have natural trends....

There should be no special trends there, since the data are mostly streaming (per period), not accumulative. In terms of data normalisation, I relied on those built into R models - they are usually default there.

Aleksey Vyazmikin #:

1. Reduce the number of attributes.

2. Improve training results on new data

First of all the first, of course, is to make the number of features (over 400 for EURUSD initially) become less than the number of months (just over 200).

Aleksey Vyazmikin #:
The first goal is achieved by various standard methods. Earlier on the forum I have already shown similar examples, when a comparable model with a hundred predictors was obtained using a pair of predictors.

Interested in easily reproducible methods using standard packages. The main thing, of course, is that crossvalidation should show sufficient significance of the remaining features - as a rule, this is the problem.

For example, you can always do a linear regression and then select a few significant features for it. But if this is repeated many times during the crossvalidation process, then each time different features turn out to be significant, and in the end you may end up with 0 significant features.

Aleksey Vyazmikin #:


I think that in economic indicators one should first understand whether the value of the indicator is positive for the exchange rate of a particular currency or not globally. Then understand why traders react to positive news for the economy as negative.

Imho, to see at least minimal significance in crossvalidation means to understand first of all. The problem is the lack of a consistent clear market reaction to the macroeconomic backdrop and/or its changes. Not that this is big news - for example, the starting post of my FA thread has links on this topic.
 
Maxim Dmitrievsky #:

Threw out the most rubbish features in terms of "importance", the results were worse without them than with them.

I got the impression that feature selection methods work either only on small dimensions or on series with strong patterns only.

or do not work at all :)

I would like at least minimal significance noticeable in crossvalidation. But that doesn't seem to be the case for FA)