Machine learning in trading: theory, models, practice and algo-trading - page 3466

 
mytarmailS #:
But I tested my usual patterns on new data, I made something like the concept of Random Forest, I have 200 patterns, a trigger on a trade is when more than 100 patterns are triggered at once.

You can bump the responses into bousting - you will get a model for interpreting signals from patterns.

 
Aleksey Vyazmikin #:

You can boolean the responses - you will get a model for interpreting signals from patterns.

No, I can't, I would do that if I could.

Signs are complex and in such a form that it is impossible to make a dataset of them, you get tens of millions of columns.
 
mytarmailS #:
h ttps:// youtu.be/c-yf4nLgq2Q

Best comment of the year: "Casino with Black and Scholes!"

 
mytarmailS #:
No, I can't. I would if I could.

The attributes are complex and in such a form that it is impossible to make a dataset out of them, you get tens of millions of columns.

I wrote about patterns, which, as I understand, are relatively few. You can make a sample from them.

I don't think it makes sense to cram a lot of predictors into a bousting. Better to use genetics with random subsamples of predictors on new mutations in the population. Maybe even make stumps separately, then run genetics with this approach....

However CatBoost can also take a part of predictors for each tree - there are more trees, but less memory is required for building.

And in general - check for similarity of predictors - maybe there they can be reduced by an order of magnitude....

 
mytarmailS #:
You're writing a fugue.


show us (at least give us a link) the standard zigzag.

 
Maxim Kuznetsov #:


show us the bullshit standard zigzag.

Here is the rule "if, then".

If the price is lower and if there is something else - drawing a line to a new extremum.


How can it be interpreted "differently"?

 
mytarmailS #:
No, I can't. I would if I could.

The attributes are complex and in such a form that it is impossible to make a dataset out of them, you get tens of millions of columns .

Are the traits correlated?

 
СанСаныч Фоменко #:

Are the traits correlated?

No, of course not.
 

Greetings, everyone!


I have this thought. Let's say there is a trading system, which puts LOs to buy and sell at some distance from the midprice, forming a spread. The same system, when gaining a position, shifts the corridor so as not to get stuck in the position, thus managing risk. The trading system is based on the ideas of Avellaneda and Stoikov and their Optimal Market Maker. That is, the logic of this system is used to determine the distance: the trading intensity is estimated and based on it and some initial parameters (gamma, delta), it is determined how wide we quote and how much we are afraid of getting stuck in a position.

The problem is that in mean reversing mode the system works well, but in case of strong trends it starts tinkering: it picks up a position not in the right direction of midprice changes, which results in drawdown. Well, that is a classic problem for this kind of trading system.

After reading about meta labeling and similar chips, as well as articles by @Maxim Dmitrievsky, I came up with this idea: what if the selection of system parameters is set not once by optimisation (using genetic algorithms or grid search), but to change them adjusting to the market? That is, for certain market conditions, when there are no trends, you can quote tighter to the midprice. For others, it is better to go wider. Ideally, the model itself should be able to do this, but it only monitors volatility and trading intensity (how far from midprice in the book the market orders occur, like how deep the book is eaten). It doesn't see volatility in a shorter window, it adjusts linearly to volatility, it doesn't see other things happening in the market. The idea is to enrich the model with a machine learning model that looks a little wider and tunes the parameters of the original model, or stops trading altogether.

I decided to proceed as follows: Have a backtester that is very close to the reality of closely testing a trading system on a piece of data. I decided to run through the backtester 50-100 variants of more or less good combinations of parameters, collect chips along the way and train a model that, looking at chips and combinations of parameters, could predict whether a period of time on these parameters will be profitable or not. The trading system updates its coefficients about once a minute. During this minute the system manages to place up to 600 orders on both sides and either earn something or lose something. So for this minute I collect chips: coefficients of the basic logic, rsi, normalised to midprice vwap, volume and so on, as well as earned dough in dollars for this minute . Then the results of the backtester run are collected in one big dataframe, we choose the profit earned per minute as the dependent variable, the data is shifted by one position to be able to predict the outcome of the trade and train the RandomForestClassifier model. The test/train 80/20 accuracy score is 95-97%.

The model is used in the following way - the model is substituted into a running trading system (live or in backtester) and we accumulate chips, as well as have an array with possible combinations of parameters for the trading system. To decide which parameters to use next minute, we run through the model the current fiches describing the market, as well as combinations of parameters and look at which combination with the current market fiches we get the highest probability of getting a profit with a positive sign (only two class labs - 0,1). If no such combination is found, we simply do not trade the next minute. If it is found, then we trade at the optimal parameters from the point of view of the model.


In Sample this idea greatly improves trading performance. Sharpe increases sharply, drawdowns fall. In OOS, things fall apart a bit. Perhaps the point is that I need to accumulate more data to train the model, but since this is essentially an HFT and the data starts to weigh gigabytes after a dozen days, this becomes a bit problematic.

There was also an attempt to use regression instead of classification and select the parameters at which the expected profit is the highest, not the most probable. But in this case the results fall apart even faster.


I have a request to criticise the idea, even throw tomatoes. If there is something constructive to suggest - great. Maybe someone has already done something similar? Am I looking in the right direction at all?

 
Arty G #:

Greetings, everyone!

I have a request to criticise the idea, even throw tomatoes. If there is something constructive to suggest - great. Maybe someone has already done something like this? Am I looking in the right direction?

As far as I understand, we need to find such market modes in which this TS works well. For example, the modes can be determined by volatility.

In the last article I suggested clustering by volatility and then choosing the best clusters for trading. If you have not read it, it may be useful.

Reason: