Machine learning in trading: theory, models, practice and algo-trading - page 644

 
Alexander_K2:

Yes! I forgot to tell you.

States are considered a set of data that characterizes a particle almost completely. I.e. it is a set of data, simply speaking - sample volume, with its characteristics - kurtosis, asymmetry, nonentropy, etc.

I.e. with confidence of R. Feynman it is possible to assert, that having correctly defined sample volume for the concrete pair, having calculated on history characteristic average values of these coefficients for this sample, it is possible to predict, that having at present time the certain set of parameters, in a certain time interval, the system will pass to a condition with its stable parameters.

This is what I expect from this branch. If you need help in determining the desired sample size, write, I will try to help.

Suppose there is an autoregressive process, i.e. dependence of present states on past states.

how to select correctly a combination of increments with different lags in order to predict the system? Should lags be constant or vary over time, and if they should change, how to determine this (except for the banal enumeration)

because if there is such a combination of increments that stably predicts the target, then there's no problem to train NS on it

 
Maxim Dmitrievsky:

then this question - let's assume that there is a process of autoregression, i.e. dependence of present states on past ones

how to correctly select a set of increments with different lags in order to predict the system? Should the lags be constant or change with time, and if they should change, how to determine this (except the banal enumeration)?

There is a point here that obviously causes difficulties.

The data set is not local in time.

This is a fundamental point!

That is, if we work with time, it is necessary to understand that in a strictly defined time interval we deal with a different set of data. This is the cornerstone of the stumbling block. Trying to work with the same sample of specific values, we arrive at a "floating" window of observations over time.

I get around this problem in 2 ways:

1. Working exclusively with a particular set of ticks (sample volume) - it is the same for a particular pair once and for all. But this set is characterized by different time of its formation and it is impossible to make forecasts.

2. Working only with a constant time window. In this case, on the contrary, we have a floating volume of ticks sampling in it. How to circumvent this difficulty? The answer is to introduce "pseudo-states", i.e. to sort of fill up the amount of data in the observation window to the desired sample volume. I do it by forcing working in exponential time scale. I.e. I read data in a strictly defined time window through the exponent, and it is not important whether it was a really arrived tick or not. In this case it is possible to make forecasts . The only thing is, I doubt that exponential intervals are the right solution. Feynman worked with uniform intervals.

If you choose 1 way - the problem of prediction is not solvable in principle.

Way 2 is solvable, but there are no such archives.

That is where I am stuck, and so far stupidly typing their own archives. I see no other way, alas...

 
Alexander_K2:

2. working exclusively with a permanent time window. In this case, on the contrary, we have a floating sample volume in it. How to get around this difficulty? The answer is to introduce so-called "pseudo-states", i.e. to sort of increase the amount of data in the observation window to the required sample size. I do it by forcing to work in exponential time scale. I.e. I read data in a strictly defined time window through the exponent, and it is not important whether it was a really arrived tick or not. In this case it is possible to make forecasts . The only thing is, I doubt that exponential intervals are the right solution. Feynman was working with uniform intervals.

Here's actually exponential intervals are just the right solution to the constant-window problem, as far as I understand it (should be). Suppose the training sample is 1000 bars (ticks), whatever. Of course, with a sliding window the patterns change, and the NS needs to be retrained. But if our sample contains a combination of increments whose lag increases exponentially... I wonder how many different lags in total the system would need not only to go beyond the window (say, lag 1500), but also to describe as effectively all possible states for the same window, only now sliding.

Maybe I'm driving me crazy :D but it's not hard to try. But the question is when to stop

 
Maxim Dmitrievsky:

In fact, exponential intervals just help to solve the problem of a constant window, as far as I understand it, it is the right solution (it should be). Suppose the training sample is 1000 bars (ticks), it does not matter. Of course, with a sliding window the patterns change, and the NS needs to be retrained. But if our sample contains a combination of increments whose lag increases exponentially... I wonder how many different lags in total the system would need not only to go beyond the window (say, lag 1500), but also to describe as effectively all possible states for the same window, only now sliding.

Maybe I'm driving me crazy :D but it's not hard to try. But the question is when to stop

And you have to stop when the system goes from an unstable to a stable state.

For example, if at a given moment of time we see the increase of nonentropy (a trend has started), then in a certain time interval the nonentropy will return to its characteristic mean value calculated from history.

How to determine this transition time interval? Well, that's the real prediction, no fooling around. That is what I expect from neural networks and I read this thread :)

 
Alexander_K2:

And it is necessary to stop when the system transitions from an unstable to a stable state.

I.e., for example, if at a given moment of time we have seen an increase in nonentropy (a trend has begun), then after a certain time interval the nonentropy will return to its characteristic average value calculated from history.

How to determine this transition time interval? Well, that's the real prediction, no fooling around. That's what I expect from neural networks and read this thread :)

Ah, the problem was quite trivial - to train the model in exponential increments, and build a spread chart between predicted and current prices, see how deviations from the mean (residuals) are distributed

I will do it next week :) but replace the word "entropy" by "dispersion

 
Maxim Dmitrievsky:

Ah, well, the task was quite trivial - to train the model on exponential increments, and build a spread chart between predicted and current prices, see how deviations from the average (residuals) are distributed

I will do it next week :)

Looking forward to it, Maxim. Seriously - I'm sure it can be done. Just don't forget to open the signal - I'll be the first to sign up.

Only 2 things can really work in the market - neural networks and theorist. All the rest are junk, fudging and empty pockets as a result.

 
Alexander_K2:

We are waiting, Maxim. Seriously - I'm sure it can be done. Just don't forget to open the signal - I'll be the first to sign up.

Only 2 things can really work in the market - neural networks and theorist. Everything else is junk, tinkering and empty pockets as a result.

I'll show you bot as an example, you may modify it later :D I'll make it as an indicator at first to make it clear

 

To follow up on this -https://www.mql5.com/ru/forum/86386/page643#comment_6472393

Dr. Trader:

I found another interesting package for sifting out predictors. It is called FSelector. It offers about a dozen methods for sifting predictors, including using entropy.


Function for sifting predictors random.forest.importance() showed quite decent results on some tests. It is inconvenient that in its opinion all predictors are at least a bit important. but if you calculate average importance, for example, and take only those predictors that are important above average, you get very good results.

library(FSelector)

load("ALL_cod.RData")
trainTable <- Rat_DF1
PREDICTOR_COLUMNS_SEQ <- 1:27
TARGET_COLUMN_ID      <- 29

targetFormula <- as.simple.formula(colnames(trainTable)[PREDICTOR_COLUMNS_SEQ], colnames(trainTable)[TARGET_COLUMN_ID])

rfImportance <- random.forest.importance(targetFormula, trainTable)
colnames(trainTable)[PREDICTOR_COLUMNS_SEQ][rfImportance[[1]] > mean(rfImportance[[1]])]
 
Dr. Trader:

Took just eurusd m1 for about January of this year, and a sliding window of 1 day.

Logically, if entropy increases, then it is necessary to suspend trading, and continue trading at low entropy. But here we have a trend at low entropy for some reason, although it is easier to trade on the flat, it is unusual.

(corrected the typo in the attached code, download it again if you already had time to download the old code)

It makes no sense to conduct tests yf the initial quote, because it is obvious to the eye that the series is not stationary.

And interesting (not for me - I always use it) are the graphs for time series log(p/p-1).

What's in there? And of course you need a scale on the ordinate axis.

 
Dr. Trader:

I found another interesting package for sifting predictors. It is called FSelector. It offers about a dozen methods for sifting predictors, including those using entropy.


There is also a very interesting large set of Relief tools in the CORElearn package.

In my long exercises on this subject, I haven't found anything better than the predictor selection functions in caret, especially saf.

But none of this will work unless predictor pre-selection is done on a "has to do" basis with the target variable.

Let me repeat again with the example of two classes:

  • part of the predictor must be related to one class and the other part must be related to another class. The intersection of these two parts of the predictors is what gives the classification error, which cannot be overcome.



PS.

We discussed the main components and you saw the flaw that the algorithm is without a teacher.

Here's one with a teacher:

sgpls( x, y, K, eta, scale.x=TRUE,
        eps=1 e-5, denom.eps=1 e-20, zero.eps=1 e-5, maxstep=100,
        br=TRUE, ftype='iden' )
Arguments

x       Matrix of predictors.
y       Vector of class indices.
K       Number of hidden components.
eta     Thresholding parameter. eta should be between 0 and 1.
scale.x Scale predictors by dividing each predictor variable by its sample standard deviation?
eps     An effective zero for change in estimates. Default is 1 e-5.
denom.eps       An effective zero for denominators. Default is 1 e-20.
zero.eps        An effective zero for success probabilities. Default is 1 e-5.
maxstep Maximum number of Newton-Raphson iterations. Default is 100.
br      Apply Firth's bias reduction procedure?
ftype   Type of Firth's bias reduction procedure. Alternatives are "iden" (the approximated version) or "hat" (the original version). Default is "iden".

Package spls.

Reason: