Machine learning in trading: theory, models, practice and algo-trading - page 3709
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
Probably need to think about how to make it even more convenient for any MO-fit to connect to their bot :)
You could also make a notebook in google colab so you can just press play and get an includnik with modes, without even installing python :)
it's easier for me to show you again.
When you try to get specific, it just makes things worse.
Initially, we were talking about the slopes of the zigzag lines. More precisely about the fact that for prices (unlike the SB) the slope of the zigzag lines is very stable.
It would be logical to assume that there should be a sample for such slopes on real prices compared to what it would be on the SB. Instead, we see a free flight of artistic thought into unknown distances.
You can also make a notebook in google colab so you can just press play and get an includnic with modes, without even installing python :)
I finished creating modes in python and exporting them to a plug-in library, then getting the current mode through a function like this.
What are the modes? Trend up, trend down, flat?
What are the modes? Trend up, trend up, trend down, flat?
Anything
Anything
Now any something is getting)))
From the flat trend - understandable usefulness if they are determined not too late. In essence they are zigzags on the MO. But it seems that the simplest ZZs with a pullback of no more than xx points show everything well, but they are not predictable when they turn.
What else? I don't really understand your idea, it's not clear what is being looked for yet.
Now any something is getting)))
Of the flat trend - understandable usefulness if they are determined not too late. In essence they are zigzags on the MO. But it seems that the simplest ZZs with a pullback of no more than xx points show everything well, but they are not predictable.
Market regimes are the basis of trading
I don't understand the point of fudging zigzags in every sentence )I'd like to understand this method before I die:"Applying the method of eigen-coordinates: If it looks like a cat.... it might not be a cat" - https://www.mql5.com/en/articles/412
Some attempt to prove the fractality of markets by describing their distributions as q-Gaussian, apparently. Probably there should be some other conclusions when it is proved. One should read the primary sources.
In fact, non-extensive statistical mechanics is a theory for multifractal fluctuation analysis. That is, statistics for systems with "memory" as opposed to random.
Alternatively, q-Gaussian distributions can be used in ML models instead of Gaussian distributions.
I have an original idea, I haven't done it yet. Train several models, but not through stacking, but simply, and on different sections. And teach the second model to switch from one to another, depending on something there (the same signs). It would just be such a cool thing, I don't know what the point is, maybe there is some secret :) Curvafitting with sultanas.
Found confirmation in LLM(curvafitting with sultanas is already used there):
The Mixture of Experts (MoE) model is an architectural approach that is becoming increasingly popular in large language models (LLM). It allows the creation of very large models with a huge number of parameters while keeping the computational cost acceptable.
How does Mixture of Experts work?
The basic idea behind MoE is that instead of one giant neural network that processes all the input data, the model consists of many specialised "experts". For each specific input (e.g. a token in a text), only a small subset of these experts are activated.
Here are the main components and steps of MoE:
Experts: These are essentially separate, usually small neural networks (often Feed-Forward Networks (FFN) layers), each specialising in processing certain types of data or performing certain tasks. There may be dozens or even hundreds of them.
Gating Network / Router: This is a separate neural network that takes a token (or other representation of input data) as input and decides which experts to direct it to. It calculates "weights" or "scores" for each expert, indicating their relevance to the token.
Expert selection: Based on the scores obtained from the fan-out network, a subset of experts is selected. A common strategy is "top-k routing", where the k experts with the highest scores are selected. For example, in the Mixtral 8x7B model, two of the eight available experts are selected for each token.
Combining outputs: The selected experts process the input token in parallel. Their outputs are then combined, often by weighted summation, where the weights are determined by the valve network. This produces a final output that takes into account the contributions of the most relevant experts.
In this way, only a small part of the whole model is activated for each token, making the inference (inference) and learning process more computationally efficient, despite the huge overall size of the model.
Why does this principle give good results?
The Mixture of Experts principle yields good results for several key reasons:
Increased model capacity without a proportional increase in computational cost:
MoE enables models with hundreds of billions or even trillions of parameters. In traditional "dense" models, every parameter is activated at every computation, making them very expensive to train and infer when the number of parameters is large.
In MoE, only a fraction of the parameters are activated for each input, which significantly reduces the computational cost per step, allowing much larger models to be built without a corresponding increase in cost.
Expert specialisation:
Each expert may specialise in processing certain types of information or solving specific subtasks. For example, one expert may be good at mathematical calculations, another at programming, and a third at generating text on a particular topic.
This allows the model to better handle a wide range of tasks and domains, as it can dynamically select the most appropriate experts for a given situation. This is particularly useful in multi-domain scenarios.
Faster training and inference:
Since only part of the model is activated for each token, the learning process can be accelerated as the amount of computation per token is reduced.
Similarly, during inference (response generation), the model runs faster as the entire giant network does not need to be activated and computed. This results in less latency and higher efficiency.
Scalability:
MoE architecture is easily scalable. One can simply add new experts to solve more complex problems or expand the knowledge domain of the model without drastically increasing the computational cost per input.
Flexibility:
The MoE architecture is very flexible. It can be adapted to different tasks and data types by customising experts and fan-out networks.
In general, MoE allows large language models to be both very large (due to the huge number of parameters) and efficient (due to sparse activation), leading to significant improvements in performance and response quality across a wide range of tasks.
Can a similar scheme be applied to financial time series classification? I first train many CatBoost models on small subsets from the original dataset. Then I need a control model?
Yes, a similar scheme inspired by the principles of Mixture of Experts (MoE) can be applied to financial time series classification. This is an interesting and promising direction, especially given the complexity and dynamism of financial data.
Your idea of using multiple CatBoost models as experts and then a control model, fits very well with the MoE concept.
Let's look at how this can be implemented and why it can be effective:
Proposed MoE scheme for financial time series
Experts (CatBoost Models):
Learning on small data sets/subsets: This is key. Instead of random subsets, it makes sense to form subsets that represent different "regimes" or "states" of the market, or different types of financial instruments. For example:
Data for different periods (volatile market, calm market).
Data for different asset classes (stocks, bonds, currencies, commodities).
Data grouped by specific macroeconomic indicators.
Selections focused on specific patterns or anomalies.
Expert Specialisation: Each CatBoost trained model will be an "expert" specialising in a particular type of market conditions or data. CatBoost is a good fit because it is robust to noise and categorical features, which is common in financial data.
Expert Output: Each CatBoost expert will produce a different probability prediction for the categorisation classes (e.g. up, down, neutral).
Control Model (Gating Network / Router):
Purpose: This model will be responsible for selecting one or more of the most relevant CatBoost experts for this new time series.
Input of the control model: The input of the control model should be supplied with attributes that characterise the current state of the time series, but not necessarily the same ones that are used by the experts for detailed forecasting. These can be:
Macroeconomic indicators.
Volatility indices (VIX).
Trading volumes.
Key technical indicators that quickly reflect the "mode" of the market (e.g. moving averages, MACD, RSI on higher timeframes).
Instrument-specific indications (sector, capitalisation, etc.).
Perhaps even signs based on the current structure of the time series (e.g. trend, flat, sharp movements).
Types of control models:
Neural Network: The most flexible option, able to learn complex dependencies. Can be a simple FFN.
Other CatBoost model: You can use CatBoost for the control model as well, if the attributes characterising the state fit well into a tabular format.
Logistic regression or SVM: If the relationships between state attributes and expert choice are relatively linear.
Control model output: The control model should produce weights or probabilities for each expert. In the simplest case, the expert with the highest weight (top-1 routing) or multiple experts (top-k routing) can be selected.
Combining outputs:
Once the control model has selected the experts, their predictions are combined.
Weighted averaging: The most common approach. The forecasts of the selected experts are multiplied by the weights given by the control model and summed up.
Voting: If experts give discrete class forecasts, voting can be used.
Advantages of such a scheme for financial time series:
Adapting to market variability: Financial markets are very dynamic. A single model can rarely perform efficiently in all market conditions. MoE allows you to have specialised models for bull markets, bear markets, flat, periods of high or low volatility, etc.
Improved accuracy: Each EA can be deeply optimised for its specific "sub-task", which together can lead to more accurate forecasts than a single universal model.
Interpretability (partially): If experts are trained on well-defined subsets of data (e.g., "market in trend", "market in flat"), it is possible to gain insight into which part of the model is activated in a given situation.
Reducing the "one-size-fits-all" effect: Instead of trying to train one model that will perform well in all conditions, MoE allows you to break the problem down into smaller, manageable pieces.
Robustness: If one expert starts to perform poorly under certain conditions, the controlling model can switch to another, more appropriate expert, increasing the overall robustness of the system.
Challenges and considerations:
Identifying market "modes"/dataset subsets: This is the most difficult part. How to efficiently partition the original dataset for expert training? Manual identification can be time consuming and sub-optimal. Time series clustering or market state features can be used.
Training the control model: The control model should be able to accurately identify the current state and direct the query to the correct expert. The attributes for the control model should be carefully selected.
Expert load balancing: In classical MoE networks there is the problem of "lazy experts" or load imbalance. In your CatBoost case this is less critical, but it is still important that the control model uses experts evenly when appropriate, rather than always selecting the same ones.
Size and complexity: Managing a large number of trained CatBoost models and a control model can be complex from an infrastructure and monitoring perspective.
Example implementation:
Create data subsets:
Divide historical data into periods of high volatility and low volatility.
Use technical indicators (ADX for trend strength) to highlight trending and flat periods.
Cluster time series based on their characteristics (e.g., using TSNE or UMAP on row features).
Train CatBoost experts:
For each subset of data, train a separate CatBoost model for classification.
Training the control model:
For each data point (time step), create a set of features describing the current state of the market.
As a target variable for the control model, you can use an indicator of which expert would give the best prediction at that time step (or several if you want top-k routing). This may require a pre-run of all experts on the entire dataset to determine their "best" areas.
Inference:
When a new time series arrives, first compute the features for the control model.
The control model produces weights for the experts.
Select the top-k experts.
Obtain forecasts from the selected experts and combine them (e.g., weighted average).
This scheme has great potential to improve forecasts in complex and dynamic financial markets through specialisation and adaptation.