Machine learning in trading: theory, models, practice and algo-trading - page 3710
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
I found confirmation in LLM(curvafitting with sultanas is already used there):
The Mixture of Experts (MoE) model is an architectural approach that is becoming increasingly popular in large language models (LLMs). It allows the creation of very large models with a huge number of parameters while keeping the computational cost acceptable.
How does Mixture of Experts work?
The basic idea behind MoE is that instead of one giant neural network that processes all the input data, the model consists of many specialised "experts". For each specific input (e.g. a token in a text), only a small subset of these experts are activated.
Here are the main components and steps of MoE:
Experts: These are essentially separate, usually small neural networks (often Feed-Forward Networks (FFN) layers), each specialising in processing certain types of data or performing certain tasks. There may be dozens or even hundreds of them.
Gating Network / Router: This is a separate neural network that takes a token (or other representation of input data) as input and decides which experts to direct it to. It calculates "weights" or "scores" for each expert, indicating their relevance to the token.
Expert selection: Based on the scores obtained from the fan-out network, a subset of experts is selected. A common strategy is "top-k routing", where the k experts with the highest scores are selected. For example, in the Mixtral 8x7B model, two of the eight available experts are selected for each token.
Combining outputs: The selected experts process the input token in parallel. Their outputs are then combined, often by weighted summation, where the weights are determined by the valve network. This produces a final output that takes into account the contributions of the most relevant experts.
In this way, only a small part of the whole model is activated for each token, making the inference (inference) and learning process more computationally efficient, despite the huge overall size of the model.
Why does this principle give good results?
The Mixture of Experts principle yields good results for several key reasons:
Increased model capacity without a proportional increase in computational cost:
MoE enables models with hundreds of billions or even trillions of parameters. In traditional "dense" models, every parameter is activated at every computation, making them very expensive to train and infer when the number of parameters is large.
In MoE, only a fraction of the parameters are activated for each input, which significantly reduces the computational cost per step, allowing much larger models to be built without a corresponding increase in cost.
Expert specialisation:
Each expert may specialise in processing certain types of information or solving specific subtasks. For example, one expert may be good at mathematical calculations, another at programming, and a third at generating text on a particular topic.
This allows the model to better handle a wide range of tasks and domains, as it can dynamically select the most appropriate experts for a given situation. This is particularly useful in multi-domain scenarios.
Faster training and inference:
Since only part of the model is activated for each token, the learning process can be accelerated as the amount of computation per token is reduced.
Similarly, during inference (response generation), the model runs faster as the entire giant network does not need to be activated and computed. This results in less latency and higher efficiency.
Scalability:
MoE architecture is easily scalable. One can simply add new experts to solve more complex problems or expand the knowledge domain of the model without drastically increasing the computational cost per input.
Flexibility:
The MoE architecture is very flexible. It can be adapted to different tasks and data types by customising experts and fan-out networks.
In general, MoE allows large language models to be both very large (due to the huge number of parameters) and efficient (due to sparse activation), leading to significant improvements in performance and response quality across a wide range of tasks.
Can a similar scheme be applied to financial time series classification? I first train many CatBoost models on small subsets from the original dataset. Then I need a control model?
Yes, a similar scheme inspired by the principles of Mixture of Experts (MoE) can be applied to financial time series classification. This is an interesting and promising direction, especially given the complexity and dynamism of financial data.
Your idea of using multiple CatBoost models as experts and then a control model, fits very well with the MoE concept.
Let's look at how this can be implemented and why it can be effective:
Proposed MoE scheme for financial time series
Experts (CatBoost Models):
Learning on small data sets/subsets: This is key. Instead of random subsets, it makes sense to form subsets that represent different "regimes" or "states" of the market, or different types of financial instruments. For example:
Data for different periods (volatile market, calm market).
Data for different asset classes (stocks, bonds, currencies, commodities).
Data grouped by specific macroeconomic indicators.
Selections focused on specific patterns or anomalies.
Expert Specialisation: Each CatBoost trained model will be an "expert" specialising in a particular type of market conditions or data. CatBoost is a good fit because it is robust to noise and categorical features, which is common in financial data.
Expert Output: Each CatBoost expert will produce a different probability prediction for the categorisation classes (e.g. up, down, neutral).
Control Model (Gating Network / Router):
Purpose: This model will be responsible for selecting one or more of the most relevant CatBoost experts for this new time series.
Input of the control model: The input of the control model should be supplied with attributes that characterise the current state of the time series, but not necessarily the same ones that are used by the experts for detailed forecasting. These can be:
Macroeconomic indicators.
Volatility indices (VIX).
Trading volumes.
Key technical indicators that quickly reflect the "mode" of the market (e.g. moving averages, MACD, RSI on higher timeframes).
Instrument-specific indications (sector, capitalisation, etc.).
Perhaps even signs based on the current structure of the time series (e.g. trend, flat, sharp movements).
Types of control models:
Neural Network: The most flexible option, able to learn complex dependencies. Can be a simple FFN.
Other CatBoost model: You can use CatBoost for the control model as well, if the attributes characterising the state fit well into a tabular format.
Logistic regression or SVM: If the relationships between state attributes and expert choice are relatively linear.
Control model output: The control model should produce weights or probabilities for each expert. In the simplest case, the expert with the highest weight (top-1 routing) or multiple experts (top-k routing) can be selected.
Combining outputs:
Once the control model has selected the experts, their predictions are combined.
Weighted averaging: The most common approach. The forecasts of the selected experts are multiplied by the weights given by the control model and summed up.
Voting: If experts give discrete class forecasts, voting can be used.
Advantages of such a scheme for financial time series:
Adapting to market variability: Financial markets are very dynamic. A single model can rarely perform efficiently in all market conditions. MoE allows you to have specialised models for bull markets, bear markets, flat, periods of high or low volatility, etc.
Improved accuracy: Each EA can be deeply optimised for its specific "sub-task", which together can lead to more accurate forecasts than a single universal model.
Interpretability (partially): If experts are trained on well-defined subsets of data (e.g., "market in trend", "market in flat"), it is possible to gain insight into which part of the model is activated in a given situation.
Reducing the "one-size-fits-all" effect: Instead of trying to train one model that will perform well in all conditions, MoE allows you to break the problem down into smaller, manageable pieces.
Robustness: If one expert starts to perform poorly under certain conditions, the controlling model can switch to another, more appropriate expert, increasing the overall robustness of the system.
Challenges and considerations:
Identifying market "modes"/dataset subsets: This is the most difficult part. How to efficiently partition the original dataset for expert training? Manual identification can be time consuming and sub-optimal. Time series clustering or market state features can be used.
Training the control model: The control model should be able to accurately identify the current state and direct the query to the correct expert. The attributes for the control model should be carefully selected.
Expert load balancing: In classical MoE networks there is the problem of "lazy experts" or load imbalance. In your CatBoost case this is less critical, but it is still important that the control model uses experts evenly when appropriate, rather than always selecting the same ones.
Size and complexity: Managing a large number of trained CatBoost models and a control model can be complex from an infrastructure and monitoring perspective.
Example implementation:
Create data subsets:
Divide historical data into periods of high volatility and low volatility.
Use technical indicators (ADX for trend strength) to highlight trending and flat periods.
Cluster time series based on their characteristics (e.g., using TSNE or UMAP on row features).
Train CatBoost experts:
For each subset of data, train a separate CatBoost model for classification.
Training the control model:
For each data point (time step), create a set of features describing the current state of the market.
As a target variable for the control model, you can use an indicator of which expert would give the best prediction at that time step (or several if you want top-k routing). This may require a pre-run of all experts on the entire dataset to determine their "best" areas.
Inference:
When a new time series arrives, first compute the features for the control model.
The control model produces weights for the experts.
Select the top-k experts.
Obtain forecasts from the selected experts and combine them (e.g., weighted average).
This scheme has great potential to improve forecasts in complex and dynamic financial markets through specialisation and adaptation.
discovered a higher power
Maxim hi, tell me, do you have no problem writing such strictly structured posts?
MO Promtuses ahahahahahahaha
discovered a higher power
No one seems to have argued that AI chats improve googling. But only the Pomptusians have managed to make an object of worship out of it.
Not exactly worship, more like a handy time-saving tool
intellectus
Not exactly worship, more like a handy time-saving tool
To save time - I don't read these portmanteaus))) Well, maybe the first 5-10 lines.
Everyone saves time, and the forum is filled with text. Everyone is on the plus side.
As expected soon the whole internet will be filled with it.
To save time, I don't read these portmanteaus))) Well, maybe the first 5-10 lines.
Everyone saves money, and the forum is filled with text. Everyone is on the plus side.
As expected soon the whole internet will be filled with it.
I remember the policy of one well-known "people's" broker in the part of his forum: to award points for posts and then pay them out.
For several years the forum has turned into a collection of synonym posts "generated" by live I-LLM-agents.
The sense of going to the branches immediately disappeared)) And these are the times before the existence of chat rooms .