Discussion of article "Evaluation and selection of variables for machine learning models" - page 4

 
JulInParis:

Hi Vlad,

I'm trying to rerun your example step by step.

In the section Input data , The In(p=16) function deals with a price object. What is its R- format or class ( zoo, xts or dataframe ) and how does it look like ( its column names, etc..). Without these information, it's impossible to run the command    x <- In(p = 16) ...

 

Best regards.

 

Julien 

Hi Julien,

> class(price)
[1] "matrix"
> colnames(price)
[1] "Open"  "High"  "Low"   "Close" "Med"   "CO"

Я приложил снимок сессии. Откройте его в Rstudio и проводите эксперименты.

Удачи

Владимир
 

Files:
EURUSD30.zip  302 kb
 
Zhi Long Yang:
Thank you very much to the author of the article. I've just started and I'm having a problem. I installed RStudio, not Revolution R Open 3.2.1 as suggested by the author. The "RandomUniformForests" package andthe "RoughSets" package are loaded, but the nearZeroVar() function and the findLinearCombos() function are not called correctly. The "RandomUniformForests" and "RoughSets" packages have been loaded, butthe nearZeroVar() function andthe findLinearCombos() function don't work properly, are these functions specific to Revolution R Open?

Revolution R Open (now maintained by Microsoft and renamed MRO) is an enhanced version of R. RStudio is just an IDE, not comparable to R. The two functions mentioned are in the author's original article. The two functions mentioned are clearly labelled as functions of the caret package in the author's original article. In addition, the author of the original article used Russian, perhaps English can still communicate, but in Chinese seems to communicate.
 

Смотрите caret: :nearZeroVar () // caret::findLinearCombos ()

Удачи

 
Vladimir Perervenko:

Hi Julien,

> class(price)
[1] "matrix"
> colnames(price)
[1] "Open"  "High"  "Low"   "Close" "Med"   "CO"

Я приложил снимок сессии. Откройте его в Rstudio и проводите эксперименты.

Удачи

Владимир
 


Dear all, 


Can someone tell me what the --Dig-- defined in ZZ function variable means. Is it a constant? if yes what should the value be of this constant?    

 
hzmarrou :


Dear all, 


Can someone tell me what the --Dig-- defined in  ZZ function variable means. Is it a constant? if yes what should the value be of this constant?    

I answered you in the next branch.
 

Hi Vladimir,


Forgive me this silly question but I'm currently trying to construct my own ( very simple ) model starting from your nice example and I'm wondering why you're shifting the ZZ's differences forward in the ZZ function :




 dz <- zz %>% diff %>% c(0,.)

...I mean, after all, we want to train a model to predict the FUTURE value of the Zigzag, so what's the point in training a model using predictors ( technical indicators ) that summarize the market quotes at the end of day N with a target value which is the sign of the difference between the Zigzag value of day N against its N-1 value ( this is what you're doing after shifting ) ? Shouldn't we use the the sign of the difference between the Zigzag's  value at day ( N+1 ) and the Zigzag's value at day N instead ( i.e we wouldn't have to shift ) ?

I know I must have missed something obvious in your methodology but if you could take 5 mns to make this cleat to me, I'd be very pleased.


Best regards.


Julien

 
JulInParis :

Hi Vladimir,


Forgive me this silly question but I'm currently trying to construct my own ( very simple ) model starting from your nice example and I'm wondering why you're shifting the ZZ's differences forward in the ZZ function :




 dz <- zz %>% diff %>% c(0,.)

...I mean, after all, we want to train a model to predict the FUTURE value of the Zigzag, so what's the point in training a model using predictors ( technical indicators ) that summarize the market quotes at the end of day N with a target value which is the sign of the difference between the Zigzag value of day N against its N-1 value ( this is what you're doing after shifting ) ? Shouldn't we use the the sign of the difference between the Zigzag's  value at day ( N+1 ) and the Zigzag's value at day N instead ( i.e we wouldn't have to shift ) ?

I know I must have missed something obvious in your methodology but if you could take 5 mns to make this cleat to me, I'd be very pleased.


Best regards.


Julien

The question is correct. There is a typo in the article. It should be like this:

1. calculate the inputs

 x <- In(p = 16 ) 

2. calculate the target

 out1 <- ZZ(ch = 25 )
 

> head(out1) zz sig [1,] 84.213 0 [2,] 84.199 -1 [3,] 84.185 -1 [4,] 84.171 -1 [5,] 84.157 -1 [6,] 84.143 -1 > tail(out1) zz sig [4995,] 89.3965 0 [4996,] 89.3965 0 [4997,] 89.3965 0 [4998,] 89.3965 0 [4999,] 89.3965 0 [5000,] 89.3965 0

3. Combine x and out in data . Wherein:

  • Delete the examples where sig == 0
  • Create a new variable Сlass (factor)
  • We shift the Class variable to 1 bar in the "future"
  • Remove the variable sig from the set

 data <- cbind(x, sig = out1[ , 2 ]) %>% tbl_df %>% 
   dplyr::filter(., sig != 0 ) %>%
  mutate(., Class = factor(sig, ordered = F) %>% dplyr::lead()) %>% 
  dplyr::select(-sig) %>% 
  na.omit() 

> data %>% str() Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4944 obs. of 18 variables: $ DX : num 0.355 0.541 6.324 3.026 9.511 ... $ ADX : num 12 11.3 11 10.5 10.4 ... $ oscDX : num 0.303 0.427 5.012 2.459 -8.641 ... $ ar : num -18.8 -18.8 -18.8 -18.8 -12.5 ... $ tr : num 0.032 0.051 0.037 0.004 0.011 ... $ atr : num 0.0422 0.0432 0.0425 0.038 0.0348 ... $ cci : num -14.75 20.6 27.23 6.22 -33.27 ... $ chv : num 0.0422 0.03 -0.0439 -0.0456 -0.1172 ... $ cmo : num -16.3 -20.1 -26.5 -39.2 -40.7 ... $ sign : num -0.0137 -0.013 -0.0117 -0.0107 -0.0108 ... $ vsig : num -0.00352 0.00655 0.0132 0.01059 -0.00103 ... $ rsi : num 45.7 49.8 50 46.8 42.4 ... $ slowD : num 0.408 0.438 0.447 0.43 0.405 ... $ oscK : num 0.0137 0.039 -0.0116 -0.0427 -0.0322 ... $ SMI : num -18.2 -16.6 -15.8 -16.2 -17.1 ... $ signal: num -12.8 -13.6 -14 -14.5 -15 ... $ vol : num 0.01005 0.01004 0.00985 0.00975 0.00946 ... $ Class : Factor w/ 2 levels "-1","1": 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "na.action")=Class 'omit' Named int [1:34] 1 2 3 4 5 6 7 8 9 10 ... .. ..- attr(*, "names")= chr [1:34] "1" "2" "3" "4" ...

Further on the text.

Good luck

 
MetaQuotes Software Corp.:

NEW ARTICLE Variable Evaluation and Selection for Machine Learning Models has been published:

By Vladimir Perervenko


There is a big problem in the application of zigzag signal as a target variable.

The basis of all models is based on a priori already zigzag points (-1, 1), other points with condition = 0 are excluded.

In practice, you don't know if the time point is a zigzag point (-1, 1) or not, and there is a high probability that it is a point with condition = 0, because it is not possible to distinguish between the two states (-1, 1) and (0).

So the same calculation and judgement is needed for the point at 0. This time the training model and the actual model will have a big deviation;

 
freewalk :

zigzag signal as a target variable is very problematic to apply.

The basis of all models is based on a priori already zigzag point (-1, 1), other points with condition = 0 are excluded.

In practice, you don't know if the time point is a zigzag point (-1, 1) or not, and there is a high probability that it is a point with condition = 0, because it is not possible to distinguish between the two states (-1, 1) and (0).

So the same calculation and judgement is needed for the point at 0. At this point, the training model and the actual model will have a big deviation;

Drawing a simple plot illustrating the numbers (-1, 1 (0) ????

Please read the article carefully? And next to it? And don't know how to use ZZ?

Maybe the translation is not good?

Specify more precisely your comments, please can improve English?