Machine learning in trading: theory, models, practice and algo-trading - page 36

 
Dr.Trader:

I don't think this package is enough to build a model capable of predicting the target variable. All I found in the help is to build a PCA model based on predictors, the target variable is not there at all.


Hello, you used wrong function, you should have used"nlPca" as it was stated on my website, but it's my fault, I should have given more details...

Here instead of predict - fitted , try it, maybe you will succeed

source("https://bioconductor.org/biocLite.R")
biocLite("pcaMethods")        

#создание  pca объекта
library(pcaMethods)

##  Data set with three variables where data points constitute a helix
data(helix)
helixNA <- helix
##  not a single complete observation
helixNA <- t(apply(helix, 1, function(x) { x[sample(1:3, 1)] <- NA; x}))
## 50 steps is not enough, for good estimation use 1000
helixNlPca <- pca(helixNA, nPcs=1, method="nlpca", maxSteps=50)
fittedData <- fitted(helixNlPca, helixNA)
plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))])
		
 
Dr.Trader:

I don't think this package is enough to build a model capable of predicting the target variable. All I found in the help is to build a PCA model based on predictors, the target variable is not there at all.

This would create a resNipals (Nonlinear Estimation by Iterative Partial Least Squares) object with 5 main components to analyze the metaboliteDataComplete table. Instead of metaboliteDataComplete, you can substitute your own table with predictors. It is important not to feed the target variable here, it will be used later.

But this will only be enough to analyze the relationships between the variables by examining the different graphs. In order to create a predictive model after that a linear regression model is built, which uses the main components PC1,PC2,PC3,PC4,PC5 as input variables (x1,x2,x3,...). And the linear model already feeds the target variable Y as the desired result. The problem is that resNipals is some object of class "pcaRes" from package pcaMethods. I couldn't find how to do all this with it in the help.

If it were a PCA model from the caret package, it would go like this:

But it doesn't work with resNipals, in theory the pcaMethods package should have some functions of its own to work with this object, but I haven't found anything.

Originally, PCA was meant to solve two problems:

1. there is a very large number of predictors with a small number of observations. This is common in organic chemistry, genetics... With us it is the use of macroeconomic data on large TFs, such as annuals.

2. There are correlations between predictors.

Therefore, PCA algorithms solve these two main problems:

1. Reduce the number of original predictors by a new, often radically smaller, number of predictors. In doing so, the algorithm ensures that this small number can explain a certain percentage of the variability of the original set of predictors, for example, 95%. This value is chosen by the researcher.

2. The new set of predictors has ZERO correlation between each other.

It follows that for us PCA isone of the algorithms forpreliminary preparation of initial data for modeling, but it cannot replace modeling for the purpose of predicting the target variable.

As it seems to me among discussion of various details of PCA the meaning of my statements on this subject was lost. So let me remind you: I gave a reference to using PCA in a way that not only reduced the number of original predictors, but also eliminated predictors that were noise to the target variable. Which is what is being discussed in this thread.

So I suggest going back to the problem of noise among predictors, and the possible use of a very specific idea of applying PCA to solve this problem.

 
Dr.Trader:

This will create resNipals (Nonlinear Estimation by Iterative Partial Least Squares) object with 5 main components to analyze metaboliteDataComplete table. Instead of metaboliteDataComplete, you can substitute your own table ...........

The one you tried is the Nipals , it is shown in the second picture on the website, it is also not very separable, you should take the one in the third picture neural network PCA

=========================

a little off topic but need help with the code....

I have predictors in columns, I want to count the difference of all predictors with all those combinations, I have done it, but now I need to name each combination adequately to understand what's what

Let's say we have columns with predictors "A" , "Б" , "С"

I make combinations of differences

1) A - B

2) A - C

3) C - B

Question: how do I give new columns the names like "a_minus_b" , "a_minus_c"

I'm just mastering "R" and programming in general, so I'm not familiar with such tricks

What should I add to this code?

a <- 1:5
b <- 6:10
c <- 11:15
d <- 16:20
dt <- data.frame(a,b,c,d) 
dt

#  все комбинации индексов между двумя переменными
#  еще транспонирую(переворачиваю матрицу) мне так удобней воспринимать
combi <- t(  combn(1:ncol(dt),2)  )  
combi  

#  пустая фрейм куда буду записывать результат вычислений с комбинацыями
res.dt <- as.data.frame(  matrix(nrow = nrow(dt) , ncol = nrow(combi))   )
res.dt

for(i in 1:ncol(res.dt)){
  #  буду проводить вычитание одной перем из другой во всех комбинацыях
  #  и записывать в res.dt
  ii <- combi[i,1]
  jj <- combi[i,2]
  
  res.dt[,i] <- dt[,ii] - dt[,jj]
}
res.dt
 
mytarmailS:

The one you tried is the Nipals , it is shown in the second picture on the website, it is also not very separable, you should take the one in the third picture neural network PCA

=========================

a little off topic but need help with the code....

I have predictors in columns, I want to count the difference of all predictors with all those combinations, I have done it, but now I need to name each combination adequately to understand what's what

Let's say we have columns with predictors "A" , "Б" , "С"

I make combinations of differences

1) A - B

2) A - C

3) C - B

Question: how do I give new columns the names like "a_minus_b" , "a_minus_c"

I'm just mastering "R" and programming in general, so I'm not familiar with such tricks

What should I add to this code?

Approximately like this.

colnames() <- c("...", "...")

There's also names

See reference.

 
SanSanych Fomenko:

The PCA was originally designed to solve two problems:

PCA was originally intended to reduce the dimensionality of the original series. That's it. To use it for selecting predictors is delirious nonsense.
 
SanSanych Fomenko:

Approximately like this.

colnames() <- c(".", ".")

And then there are names

See reference.

))) I understand it, but if there are 1000 variables, why manually write each one?
 

Dr.Trader:

...

The data is taken from eurusd d1, class 0 or 1 - price decrease or increase for the next bar. If the model correctly predicts the result for test.csv at least in 6 out of 10 cases, then you can already try to trade with it in Forex, in principle it won't fail, but you shouldn't expect too much profit. If it correctly predicts in 7 out of 10 cases (and above) - this is the right way to the grail, we should try the training and the model test in other years and months, and if it is the same everywhere - that's very good.

...


More specifically in the report file:

/**
* The quality of modeling in out of sample:
*
* TruePositives: 182
* TrueNegatives: 181
* FalsePositives: 1
* FalseNegatives: 1
* Total patterns in out of samples with statistics: 365
* Total errors in out of sample: 2
* Sensitivity of generalization abiliy: 99.4535519125683%
* Specificity of generalization ability: 99.45054945054946%
* Generalization ability: 98.90410136311776%
* Indicator by Reshetov: 8.852456238401455
*/


It's time to create a team and do the Open Source project - automated system for this purpose on mql5 and Java. I'll show you the source code of binary classifier in Java and MQL5 script, that creates a sample for training models.

A rough plan of how the complex will work:

  1. For each financial instrument there are separate robots on the charts, which are triggered by the opening prices of the bars and saves the patterns in the files.
  2. The Java application (binary classifier) downloads models for each symbol and waits with an interval of 1 second for files created by robots from step 1. From the file the symbol ticker and a pattern for it are read. Then the file with the pattern is deleted from the disk. Based on the pattern, the tool classifies a trading signal using the corresponding model and writes it into the file.
  3. One robot waits for the files from the hedge from step 2. As soon as such a file is found, the robot reads the instrument and the signal from it and opens or unfolds a double deal on the instrument according to the signal. The read file is deleted.

If you are interested in joining the project and know how to program in Java or MQL5, then please subscribe to this thread.

 
mytarmailS:

Hello! You used the wrong function, you should have used"nlPca" as indicated on the site that I gave, but it's my fault, you need more detailed information...

Here instead of predict - fitted , try it, maybe you will succeed

This example is from another thread unfortunately. For example in Forex we can always get 100% of data, but in other areas where data is acquired experimentally there will always be gaps, missing values. In this example PCA is used to reconstruct missing values in the predictors themselves. In the example they randomly clear values in one of three columns, create a pca model and use it to reconstruct missing values.

I've never tried this, but technically you could also treat the target variable as a predictor by including it in the pca model. Then, in new data its value will be unknown, so pca can fill in those missing values.

source("https://bioconductor.org/biocLite.R")
biocLite("pcaMethods")        

#создание  pca объекта
library(pcaMethods)

##  Data set with three variables where data points constitute a helix
#trainData - таблица с обучающими примерами. Целевая переменная тоже должна быть в этойже таблице. Пример колонок: Close,Hour,MA30,target. (target - целевая переменная, со значениями например 0 или 1 обозначающими падение/рост цены)
## 50 steps is not enough, for good estimation use 1000
NlPca <- pca(trainData, nPcs=1, method="nlpca", maxSteps=50)
#newData - таблица с новыми данными, для проверки модели. Колонки должны быть теже Close,Hour,MA30,target
newData[,"target"] <- NA  #целевая  переменная на новых данных не должны быть известна модели, и в случае NA - функция fitted должна подставить туда подходящие значения
fittedNewData <- fitted(NlPca, newData)
fittedNewData[,"target"] #  ваш результат предсказания
Я не могу начать новую нормальную строку :/ В общем можно сделать как я написал выше, но это не предсказание а реконструкция. PCA модель может просто найти некие значения которые "подойдут". Гарантий никаких нет, я предполагаю что ошибка будет 50%
 
Combinator:
PCA was originally intended to reduce the dimensionality of the original series. That's it. To use it to select predictors is the most delusional nonsense.

No, that's okay. Since the predictors used for each of the principal components are known, you can safely sift out those predictors that are not used in the principal components. Read this, I liked it: http://www.win-vector.com/blog/2016/05/pcr_part2_yaware/ There the data contains 5 good predictors, and dozens of noise predictors. They sift out the noise with analysis. I used the same code for other examples appearing in this thread, it generally works. But forex is more complicated, I haven't got such nice pictures on it with indicators, I should think of something smarter.

 
Dr.Trader:

This example is unfortunately from another topic. In Forex, for example, we can always extract 100% of the data, but in other fields where data is obtained experimentally, there will always be gaps, missing values. In this example, PCA is used to reconstruct missing values in the predictors themselves. In the example, they randomly clear a value in one of the three columns, create a pca model, and use it to reconstruct missing values.

I've never tried this, but technically you could also treat the target variable as a predictor by including it in the pca model. Then, in new data its value will be unknown, so pca can add those missing values.

Damn, I suspected that it's no accident that these "NA"-shkins are thrown in the date, but I read the manual, it clearly says PCA with a neural network, but then it's still not clear how this guy from the site got this beautiful picture with a good separation of classes
Reason: