Machine learning in trading: theory, models, practice and algo-trading - page 1782

 
Valeriy Yastremskiy:

What do you share? And what is it about increments that doesn't suit you? They are essentially time-adjusted speeds. But I can't do it without averaging. But if you start taking averages into account, you quickly get a maze. There has to be a working middle somewhere. On the last tick or bar it is not enough, and a little more is wilderness.

There are two or more priors with different lags, for different number of clusters

Since there is no functional dependence between a pair of increments, the cloud is stupidly divided in half, etc. We need something stricter than increments. Maybe they should be transformed in some way.

Examples


 
Maxim Dmitrievsky:

2 or more increments with different lags, on different number of clusters

Since there is no functional dependence between a pair of increments, the cloud is stupidly divided in half, etc. We need something stricter than increments. Maybe they should be transformed in some way.

Examples


I don't understand about the pair of increments. On the last 2 bars or something else?

I have an idea in the direction of speeds and averages, too. The system should be trained on different TFs, on the interaction of different TFs, and there should be some ticky crap. i.e. there should be ticky behaviors when the TS makes any decisions.

The different TFs are just a decrease in the weight of signs away from the current state. Semko has his own system there, but I like TF even better, there is evenness and some consideration of extrema.

It occurred to me. We place orders to the price cloud and hence the drawdown will be negative 99% of the time. But how can we estimate that we were not mistaken? Using the nearest extrema, if the nearest extrema are negative, we may close without a loss.

 

What we can measure on the last couple of bars and in the history of 120 bars. On a month that's 10 years. Enough it seems.

Mach 2, 14, 30, 120, 480 velocities and find maxes and kinks

Spreads between adjacent strokes and find maxes and kinks

Maximum differences in price from the Machs, but these are usually the real extremes in price.

Average trend times, highlighting maxima and minima

Average spreads in trends, ala doncian.

And it is possible to separate the trends and the flat and their durations

The average time of the trends in the flat. The trends of the lower TFs in the higher ones.

The average time of trends.

And it seems that different parameters become significant depending on the others. And the connection is not obvious. It is the first thing that comes to mind to connect the lower TF with the higher one, but it is clear that it is not enough. And I cannot find any logic in the links yet.

 
Valeriy Yastremskiy:

I don't understand about a couple of increments. On the last 2 bars or something else?

2 time series with different lags. It is possible to cluster anything, but then it all comes back to the lack of understanding of the subject area and what and why it is clustered. On the Internet I haven't seen any successful examples. By the way, I wanted to allocate clusters instead of seasonal components, and forgot about it, started shoving in MO... Uy... then that would be a different study
 
mytarmailS:

Well time is an indirect sign of volatility, which is seasonal in time, there are active trading hours and there are passive

I agree, I didn't take it into account.

mytarmailS:

If you can save it, then in order to teach the model it is necessary to load this matrix in the environment, and that will be the end of it )) or rather earlier, at the stage of formation of the matrix with predicates

Try CatBoost. In any case, I can train it and see the result.

mytarmailS:

Wow gig is not small, I wonder how many signs you have?

566 in this sample.

mytarmailS:

What is the genetic tree?


1) simple )

2) how is it? And how do you adjust the predictors for ZZ ?

3) Well, you have a candlestick as the opening or something like that, it's already distortion, because they should by clause, and here immediately a lot of confusion, signs to build how, how to do the target, etc. (unnecessary pain), if you change something for yourself, you should always leave the original for others)

The script in R, which builds a tree using a genetic algorithm, selecting the splits. I'm not really versed in it, it's Doc's work.


2. I use ZZ based predictors, obviously they are more efficient if they and the target are calculated on the same ZZ.

3. I don't know his OHLC at the beginning of the bar, so that's how I wrote it down - as it happens in real life.

Bottom line, should I redo it or is there no point?

 
Aleksey Vyazmikin:

Bottom line, should I redo it or is there no point?

ketbust will not help, the problem with the size of the data, I will not even be able to create traits, you will not even get to training...

Make a sample of 50k, let it be small, let it be not serious, let it be more possible to retrain, .... ..., ... The task is to make a robot for production, and just reduce the error by joint creative work, and then the knowledge gained can be transferred to any tool and market, 50к is quite enough to see what signs mean something.

Aleksey Vyazmikin:

3. 3. At the beginning of the bar, I do not know its OHLC, so I wrote it down - as it happens in real life.

Well if you do not know OHLC then you do not need to write it, why move the whole OHLC? Nobody does that, you just need to shift ZZ by a step, as if to look into the future by 1 step for training and that's all. Have you read at least one article by Vladimir Perervenko about deerelearning? Please read. This is very uncomfortable when there is already a well-established optimal data operations and everyone is used to them, and someone is trying to do the same thing but in their own way, another way, it's kind of pointless and annoying and the cause of many errors in people who try to work with such an author's data.


If after all this you still want to do something I have the following requirements

1) the data 50-60k no more, preferably one file, just agree that the n of the last candle will be the test

2) The data, preferably without pasting, since it is possible to consider not only the latest prices, but also the support and resistance, it is impossible with pasting

3) The target should be already included into the data

4) Data in the format date,time,o,h,l,c, target


Or should I create a dataset ?

 
Maxim Dmitrievsky:
Two time series with different lags. You can cluster everything, but in this case everything again comes up against a lack of understanding of the domain and what and why it is being clustered. On the Internet I haven't seen any successful examples. By the way, I wanted to allocate clusters instead of seasonal components, and forgot about it, started shoving in MO... Uy... then that would be a different study

It happens, logic does not tolerate any bullshit or pissing off)))) .... There are problems with understanding so far. All there is is averaging, thinning and GA with training on fairly short data. I haven't seen any work on separation of series characteristics either. On the one hand analysis of series for different TFs must be identical. There should be criteria for going to the lower timeframe. Like, if trends with sufficient spread and speed are determined on the lower timeframe, then it is possible to move to them against the trend of the higher timeframe. But this is logic. We should somehow group the characteristics and look at the different behavior of the series. If we solve it from the opposite direction.

At the nuclear power plant, we were looking at 19 parameters. They had a table with 3 to 7 parameters when the zone is red and the rods should be removed. There was not one parameter there either and they were not interrelated. Ours is different, of course, but the time scale is too large, and there is no or not always a connection between tick and monthly behavior. In general, we should look at the connection between the parameters, and how long this connection exists.

But it is still very difficult.

 
Valeriy Yastremskiy:

It happens, logic does not tolerate any bullshit or pissing off)))) .... There are problems with understanding so far. All there is is averaging, thinning and GA with training on fairly short data. I haven't seen any work on separation of series characteristics either. On the one hand analysis of series for different TFs must be identical. There should be criteria for going to the lower timeframe. Like, if trends with sufficient spread and speed are identified in the lower timeframe, then it is possible to move to them against the trend of the higher timeframe as well. But this is logic. We should somehow group the characteristics and look at the different behavior of the series. If we solve it from the opposite direction.

At the nuclear power plant, we were looking at 19 parameters. They had a table with 3 to 7 parameters when the zone is red and the rods should be removed. There was not one parameter there either and they were not interrelated. Ours is different, of course, but the time scale is too large, and there is no or not always a connection between tick and monthly behavior. In general, we should look at the connection between the parameters, and how long this connection exists.

But it is still very difficult.

I don't walk past a bomber with a nuclear warhead without a joke :)
 
Maxim Dmitrievsky:
I don't walk past a bomber with a nuclear warhead without a joke :)

What can we do without them, in this wilderness)))) It all started with nuclear shit, probabilistic calculator with averages, feedback and Bayesian, the criterion of confidence is something))) Apparently also have to manually select the parameters first. A lot of them too.

In general, the idea is to look at a number of bars 120 and get some shit from them in different variants. It is not good to measure and train on the current states.

 
Valeriy Yastremskiy:

What can we do without them, in this wilderness)))) It all started with nuclear shit, probabilistic calculator with averages, feedback and Bayesian criterion of confidence, that's something))) Apparently also have to manually select the parameters first. A lot of them too.

In general, the idea is to look at a line of bars 120 and get some shit from it in different variants. It is not good to measure and train on the current states.

What are the current states? If it's about clusters, you just need to sweep the stats on the new data. If they are the same, then you can build CA.

Reason: