Machine learning in trading: theory, models, practice and algo-trading - page 3589

 
Man, this competition keeps me up at night, can't find a proper conversion
[Deleted]  
mytarmailS #:
Man, this contest keeps me awake at night, can't find a normal conversion

In chatgpt drive the data itself, maybe there is something similar in the database )

[Deleted]  

I haven't seen anyone do that. Smoothing marks. Sometimes it's a good way to avoid overtraining and can give you a variety of models.

If you are limited in variety of markings or want to partially remove overtraining, you can use it.

The longer the smoothing period, the less overtraining.

[Deleted]  
Maxim Dmitrievsky #:

I haven't seen anyone do that. Smoothing marks. Sometimes not bad for overtraining and can give a variety of models.

If you are limited in variety of markings or want to partially remove overtraining, you can use it.

The longer the smoothing period, the less overtraining.

Original:

Smoothing, period 5:

Period 15:

Period 25:

Period 50:

100:

200:

300:


[Deleted]  
The optimal smoothing period is related to the average transaction duration.
[Deleted]  
Everyone remembers correlation curves (looking for similar patterns in BP) and fanning future values that are no longer correlated.

This should pretty much kill overtraining and reduce model complexity arbitrarily. Asked LLM to formulate:

Using clustering to find similar patterns and repartitioning based on the average label of each cluster can be an effective way to improve the quality of dataset partitioning. Here is a step-by-step methodology on how this can be done:

1. **Dataset Preparation**:
- Load your dataset and make sure the data is normalised or standardised.
- Create additional attributes if needed, such as technical indicators for financial data.

2. **Select a clustering method**:
- Select an appropriate clustering method such as K-means, DBSCAN, or hierarchical clustering.

3. **Determine the number of clusters**:
- Use methods such as elbow method or silhouette analysis to determine the optimal number of clusters.

4. **Application of clustering**:
- Apply the selected clustering method to your data.
- Analyse the clustering results to understand which patterns or groups stand out.

5. **Calculate the average label for each cluster**:
- For each cluster, calculate the average label based on the raw labels.
- If the average label is close to 0.5, this may indicate uncertainty in the cluster and further analysis may be required.

6. **Data Repartitioning**:
- Use the average cluster labels to repartition the data. For example, if the average label of a cluster is greater than 0.5, all data points in that cluster can be labelled as "buy" (label 1), otherwise as "sell" (label 0).

7. **Evaluate the quality of the new markup**:
- Use metrics such as accuracy, completeness, F1-score to evaluate the quality of your new markup.
- Perform backtesting on historical data to see if your markup leads to better results.

8. **Iterative Improvement**:
- Repeat the process with feedback and new data to improve the quality of your markup and clustering.

[Deleted]  
This is a fantasy on the topic of replacing Alexey's rectangle fitting with something more handy :) at least in terms of retraining.
The second method (clustering) looks more interesting than the first one (smoothing labels), because it will take into account the structure of the dataset (patterns).
The complexity of the model can be adjusted through the number of clusters.
 
Maxim Dmitrievsky #:

In chattgpt drive the data itself, maybe there is something similar in the database )

What the hell is gpt, I spent three days just processing dirty data to bring it to normal tabular form and to synchronise everything by time, then I rewrote everything from scratch again ....

This is real data, baby... with sensor errors, with noise in the values, with recorder sloppiness, with completely different trait scales in each observation, with wild multicolinearity, with time discrepancies, and so on.

I would like some of you to feel the same pain that I feel )
[Deleted]  
mytarmailS #:
It took me three days to process the dirty data to bring it back to normal tabular form and synchronise everything by time, then I rewrote everything from scratch again....

This is real data, baby... with sensor errors, with noise in the values, with recorder sloppiness, with completely different trait scales in each observation, with wild multicolinearity, with time discrepancies, and so on.

I would like some of you to feel the same pain that I feel )
I know :) helped with sensor signal processing in the gas chemical field. But there was a whole team working there. Good luck 😁
[Deleted]  

Motivated by.

An ultimatum masst hav f-y, thanks to which the model is never retrained. But don't expect any graceful backtests either.

The number of clusters corresponds to the number of patterns into which you want to divide all the examples and calculate the average label for each cluster.

The input is a labelled dataset with features, the output is spit out with corrected labels that prevent overtraining.


ExampleS:


Машинное обучение в трейдинге: теория, модели, практика и алготорговля
Машинное обучение в трейдинге: теория, модели, практика и алготорговля
  • 2024.07.22
  • mytarmailS
  • www.mql5.com
Добрый день всем, Знаю, что есть на форуме энтузиасты machine learning и статистики...