Clustering by Clustering By Volatility - General

Maxim Dmitrievsky 2024.03.16 04:10 #34341

Aleksey Vyazmikin #:

I share the results, so tree [3,3,3,3] - each level in the leaf is clustered into 3 clusters, for a total of 3 levels, resulting in 27 final leaves.

We apply the result on 3 samples - test and exam did not participate in tree construction.

The graph below shows the shift of target "1" in per cent relative to the average value of target "1" in its sample - for each leaf of the tree.

What is pleasing is the stability of the offset, i.e. where there was more than 0 or 1 on the train, it often remains the same on the other samples.

However, if we take examples in the train sample with an offset of more than 5%, we get only about 20% of examples from the whole sample, which is not enough.

50 clusters. Clustering by volatility.

Learn 0 BATCH 0 model
R2: -0.6975072967971953
Learn 0 BATCH 1 model
R2: 0.9475296697469157
Learn 0 BATCH 2 model
R2: 0.016017417967087666
Learn 0 BATCH 3 model
R2: -0.4384662368711072
Learn 0 BATCH 4 model
R2: -0.815427090254198
Learn 0 BATCH 5 model
R2: 0.22019051717293892
Learn 0 BATCH 6 model
R2: -0.3800956817962319
Learn 0 BATCH 7 model
R2: 0.07809908347775751
Learn 0 BATCH 8 model
/Users/dmitrievsky/miniforge3/lib/python3.10/site-packages/sklearn/metrics/_regression.py:1187: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
R2 is fixed to -1.0
R2: -1.0
Learn 0 BATCH 9 model
R2: 0.004266344436175684
Learn 0 BATCH 10 model
R2: 0.4981881787045557
Learn 0 BATCH 11 model
R2: 0.0053097855769144164
Learn 0 BATCH 12 model
R2: 0.054027256930358925
Learn 0 BATCH 13 model
R2: 0.5899093013972916
Learn 0 BATCH 14 model
R2: -0.4946899648797436
Learn 0 BATCH 15 model
R2: 0.7939071178120198
Learn 0 BATCH 16 model
R2: 0.6477014231474504
Learn 0 BATCH 17 model
R2: 0.47273004383962347
Learn 0 BATCH 18 model
R2: 0.248941419849391
Learn 0 BATCH 19 model
R2: 0.800761188745112
Learn 0 BATCH 20 model
R2: 0.11376642072996479
Learn 0 BATCH 21 model
/Users/dmitrievsky/miniforge3/lib/python3.10/site-packages/sklearn/metrics/_regression.py:1187: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
R2 is fixed to -1.0
R2: -1.0
Learn 0 BATCH 22 model
R2: 0.7838797961879156
Learn 0 BATCH 23 model
R2: -0.4268170909757055
Learn 0 BATCH 24 model
R2: 0.5737133818949138
Learn 0 BATCH 25 model
R2: 0.36270135520520974
Learn 0 BATCH 26 model
R2: -0.12049707206984317
Learn 0 BATCH 27 model
R2: 0.27317804513554567
Learn 0 BATCH 28 model
R2: -0.005998374759471736
Learn 0 BATCH 29 model
R2: 0.32044801432519554
Learn 0 BATCH 30 model
R2: 0.3106442469001309
Learn 0 BATCH 31 model
R2: 0.6887039726402294
Learn 0 BATCH 32 model
R2: -0.5321374845982312
Learn 0 BATCH 33 model
/Users/dmitrievsky/miniforge3/lib/python3.10/site-packages/sklearn/metrics/_regression.py:1187: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
R2 is fixed to -1.0
R2: -1.0
Learn 0 BATCH 34 model
R2: -0.036793036508273036
Learn 0 BATCH 35 model
R2: -0.13234917037561222
Learn 0 BATCH 36 model
R2: -0.7195407853537261
Learn 0 BATCH 37 model
/Users/dmitrievsky/miniforge3/lib/python3.10/site-packages/sklearn/metrics/_regression.py:1187: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
R2 is fixed to -1.0
R2: -1.0
Learn 0 BATCH 38 model
R2: 0.7462004403933388
Learn 0 BATCH 39 model
R2: -0.014321696157866604
Learn 0 BATCH 40 model
R2: 0.8581957623344842
Learn 0 BATCH 41 model
R2: -0.5150586236832828
Learn 0 BATCH 42 model
R2: 0.88749024612301
Learn 0 BATCH 43 model
R2: -0.02320296391829768
Learn 0 BATCH 44 model
R2: 0.5986943139612424
Learn 0 BATCH 45 model
/Users/dmitrievsky/miniforge3/lib/python3.10/site-packages/sklearn/metrics/_regression.py:1187: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
R2 is fixed to -1.0
R2: -1.0
Learn 0 BATCH 46 model
R2: 0.08375468541665188
Learn 0 BATCH 47 model
R2: 0.7318534153534879
Learn 0 BATCH 48 model
R2: 0.3927709261051455
Learn 0 BATCH 49 model
R2: 0.8029591287960962

Best results:

Maxim Dmitrievsky 2024.03.16 04:32 #34342

Aleksey Vyazmikin #:

And this is how the leaves of the 3-3-3-3 tree look like, below are separate graphs for each sample - in the order of train, test, exam

Well, we can see that the probability shift in the leaves is already larger, but instability is also beginning to appear. The percentage of examples in positive leaves of the tree (clusters) decreased in the subsequent samples, which once again confirms the uniqueness of combinations characteristic of a certain section of the trade. I note that the sample is complex - only 10% of units (positive target units).

Try EURGBP

mytarmailS 2024.03.16 07:02 #34343

Maxim Dmitrievsky #:

Try EURGBP

Does your tester count the entry by the cloze of the signal candle or by the opening of the next candle?

Is the commission counted?

mytarmailS 2024.03.16 07:02 #34344

Aleksey Vyazmikin #:

It looks very interesting. Any details?

I will, but more on that later.

Maxim Dmitrievsky 2024.03.16 07:11 #34345

mytarmailS #:
Does your tester count the entry by the cloze of a signal candlestick or by the opening of the next candlestick?

Is the commission counted?

Ask chatgpt to rewrite to R %)

def tester(dataset: pd.DataFrame, plot = False):
    last_deal = int(2)
    last_price = 0.0
    report = [0.0]
    chart = [0.0]
    line = 0
    line2 = 0

    indexes = pd.DatetimeIndex(dataset.index)
    labels = dataset['labels'].to_numpy()
    metalabels = dataset['meta_labels'].to_numpy()
    close = dataset['close'].to_numpy()

    for i in range(dataset.shape[0]):
        if indexes[i] <= FORWARD:
            line = len(report)
        if indexes[i] <= BACKWARD:
            line2 = len(report)

        pred = labels[i]
        pr = close[i]
        pred_meta = metalabels[i] # 1 = allow trades

        if last_deal == 2 and pred_meta==1:
            last_price = pr
            last_deal = 0 if pred <= 0.5 else 1
            continue

        if last_deal == 0 and pred > 0.5 and pred_meta == 1:
            last_deal = 2
            report.append(report[-1] - MARKUP + (pr - last_price))
            chart.append(chart[-1] + (pr - last_price))
            continue

        if last_deal == 1 and pred < 0.5 and pred_meta==1:
            last_deal = 2
            report.append(report[-1] - MARKUP + (last_price - pr))
            chart.append(chart[-1] + (pr - last_price))

    y = np.array(report).reshape(-1, 1)
    X = np.arange(len(report)).reshape(-1, 1)
    lr = LinearRegression()
    lr.fit(X, y)

    l = lr.coef_
    if l >= 0:
        l = 1
    else:
        l = -1

    if(plot):
        plt.plot(report)
        plt.plot(chart)
        plt.axvline(x = line, color='purple', ls=':', lw=1, label='OOS')
        plt.axvline(x = line2, color='red', ls=':', lw=1, label='OOS2')
        plt.plot(lr.predict(X))
        plt.title("Strategy performance R^2 " + str(format(lr.score(X, y) * l,".2 f")))
        plt.xlabel("the number of trades")
        plt.ylabel("cumulative profit in pips")
        plt.show()

    return lr.score(X, y) * l

mytarmailS 2024.03.16 07:22 #34346

Maxim Dmitrievsky #:

Ask chatgpt to rewrite to R %)

It'd be nice to know what's in the input dataframe.

So I take it the current cloze is opening?

Aleksey Vyazmikin 2024.03.16 07:38 #34347

Maxim Dmitrievsky #:

50 clusters. Clustering by volatility.

Best results:

How the volatility was measured - let it remain a mystery, but write how you estimated the result in clusters to classify them, otherwise it is not clear.

Aleksey Vyazmikin 2024.03.16 07:40 #34348

Maxim Dmitrievsky #:

Try EURGBP

Looks beautiful. Tests on different instruments a little later - for now I need to set up the technology.

berneicebuster 2024.03.16 07:53 #34349

Alexey Burnakov #:

Ok.

If someone decides or at least come close to the right solution (that is, the topic will be alive), then I:

will post the correct solution - the algorithm for generating the dataset

explain why a number of other " Predictor Estimation and Selection" algorithms failed

I'll post my method, which robustly and sensitively solves similar problems - I'll give the theory and post the code in R.

This is done for mutual enrichment of "understanding" of machine learning tasks.

Keep Posting!

Maxim Dmitrievsky 2024.03.16 10:22 #34350

mytarmailS #:
It would be nice to know what's in the dataframe on the input.

So I take it the current cloze is opening?

Closes and marks in the dataframe, highlighted in yellow. Nothing else is required for testing.

Machine learning in trading: theory, models, practice and algo-trading - page 3435