Machine learning in trading: theory, models, practice and algo-trading - page 3723

 
СанСаныч Фоменко #:

What do you mean, "patterns"?

In wood patterns are wood and they don't repeat. But there's another observation over randeforest.

When sampling 1500 bvr the number of trees is sufficiently 70, after which the error stabilises. Another interesting thing is that if we take more than 1500 bvr, for example 5000, then 70 trees are still enough. But it is not clear whether these are the same trees or different ones? I suspect that these are different trees, i.e. "non-stationarity" of trees is observed, which is why it does not work when running a window outside the sample.

We're not looking at any patterns. Only price patterns, I don't think it doesn't need to be explained.

There it depends more on the tree depth settings than on the number of trees. The number reduces overtraining, if it is possible. If you need more tree depth when increasing the sample, then the series is also non-stationary, similar to the situation with increasing number of features. This already shows that we are engaged in kurvafitting and pi-hacking.
 
СанСаныч Фоменко #:

It's all black and black and hopeless....

I agree that the problem of nonstationarity as such is probably not solvable today.

But it should be solved in parts.

Predictors.

Are we taking them in their original form, or is preprocessing required. what is the purpose of preprocessing and what are the quality criteria?

Relationship from predictors to target.

Has anyone made a histogram of this relationship? What are the criteria for a "good" relationship?

Probability of class prediction.

Has anyone plotted histograms of these probabilities? I have and have had some surprising and unexplained results.

There is no constructive definition of non-stationarity - it is simply the absence of stationarity.

There are no universal methods for dealing with non-stationarity. For example, constructing a histogram from a sample makes no sense if the sample is not i.i.d., i.e. non-stationary - if there is no distribution common to all elements of the sample, the histogram cannot be an approximation for anything.

The only possibility is to empirically guess a model of non-stationarity suitable in practice. The main problem here is that the overwhelming majority of nonstationarity is unmodellable in principle.

 
Mihail Marchukajtes #:
Unfortunately, it is the nonstationarity of the series that kills the result of any strategy over time. Doesn't it?
Right. Simply mathematical approach to any problem does not work without clarification of formulations.
 

Not bad MOSHKs are obtained on markup with divergences (trades are marked only when there is confirmation from divergences). It is stable on OOS. I checked it today, I have not optimised the parameters yet.



 
Practice shows (unlike the general theory) that in Forex it is best to go through markups rather than signs. Some more or less understandable markups from strategies accumulated by the collective trader's unconscious.
 

Inexplicable efficiency of this partitioning method, with different parameters min_val, max_val:

@njit
def calculate_labels_divergence(close_data, rsi_data, markup, min_val, max_val):
    labels = []
    for i in range(max_val, len(close_data) - max_val):
        bullish_div = False
        bearish_div = False
        
        for lookback in range(min_val, max_val + 1):
            prev_idx = i - lookback
            
            # Bullish divergence: price lower, RSI higher
            if close_data[i] < close_data[prev_idx] and rsi_data[i] > rsi_data[prev_idx]:
                bullish_div = True
                break
            
            # Bearish divergence: price higher, RSI lower
            if close_data[i] > close_data[prev_idx] and rsi_data[i] < rsi_data[prev_idx]:
                bearish_div = True
                break
        
        # Checking the future price for profitability
        if bullish_div or bearish_div:
            rand = random.randint(min_val, max_val)
            curr_pr = close_data[i]
            future_pr = close_data[i + rand]
            
            if bullish_div:
                # To buy, check that the price will go up
                if (future_pr - markup) > curr_pr:
                    labels.append(0.0)
                else:
                    labels.append(2.0)
            elif bearish_div:
                # For a sale we check that the price will drop
                if (future_pr + markup) < curr_pr:
                    labels.append(1.0)
                else:
                    labels.append(2.0)
        else:
            labels.append(2.0)
    
    return labels


 

Useful information for ML-experimenters about treadne and flat TCs. It didn't seem to be paid attention to before, but in the process of experimentation it came up.

That's a great observation! It is indeed true, and here's why.

<Edited by moderator : Please refrain to post ChatGPT generated message, it's not allowed on this forum>

 
Maxim Dmitrievsky #:
That's a great observation! It's really true, and here's why.

Did you use ChatGPT?

because this opus looks too much like his result.

 
Maxim Kuznetsov #:

Did you use ChatGPT?

cause this opus looks too much like his result.

The science of hyphens is too complicated.
 
Maxim Dmitrievsky #:

Statistical validation

Predictability can be objectively assessed using the Hurst Exponent ( H):

  • H< 0.5 - mean reversion (flat): past data has predictive power;

  • H = 0.5 - random walk: past data does not predict the future;

  • H> 0.5 - trending: past data is less informative for prediction.

To get these statistics 20 bars will not be enough.
Maxim Dmitrievsky #:

Key paradox

In a trending market, past prices signal "the price will go up", but do not indicate when the trend will end. It is this uncertainty that makes forecasting in trends much more difficult than in flat markets.

For this reason, ML models in flat markets are often successfully trained on historical patterns, while in trending markets they are often ineffective due to the rapid loss of data relevance.

And according to her, flat markets are eternal or something