Machine learning in trading: theory, models, practice and algo-trading - page 26

 

The current state of my experiment.

After fixing the errors in the code on my data, I actually have the following results on validation.

The chart enumerates currency pairs, forecasting horizons and so-called gray zones - value intervals at the output of the forecasting machine, when no decisions to enter the market are taken.

As you can see, I already have a positive MA for a number of pairs. For EURUSD it is the maximum one. At the same time I make the experiment even more accurate, using actual spreads for all pairs (from my DC):

spreads <- as.data.frame(cbind(

c('audusd'

, 'eurusd'

, 'gbpusd'

, 'usdcad'

, 'usdchf')

, c(0.00018

, 0.0001

, 0.00014

, 0.00013

, 0.00012)

)

)

Now I'm running a really powerful generalization experiment, in which I am going to search even more parameters and it will last about a week. There will be about 60 times more estimates. But then I'll definitely have some good results to compare. And, as I promised somewhere, I will post the Expert Advisor (trained) with the basic logic ready for testing. You can further develop it. I will improve it for myself by adding different enhancers directly to the MQL code.

I will talk to you soon.

 
Alexey Burnakov:

Nah, you sure don't quite understand the importance of non-stationarity. It doesn't matter if it's an NS or linear model or my model, if your data is non-stationary, then the dependencies found on it are guaranteed not to occur outside the sample. All the data that you have looks like: raw price, MA(raw price), bar opening (raw price), etc. should be removed from the model. You need to take their difference from the last known price.

Scaling into an interval is not an option here.

It's not all that bad. If you consider each training example separately (ie, when normalizing to work on one line of the training table), you can normalize all the data in groups, separately in each training example. For example, take the columns o,h,l,c for all 100 bars (400 in total) within one training example, find their minimum and maximum, and scan them. This should be repeated for every training example separately. This normalization will ensure that the price in the training table, in each row, will always be in the range strictly [0...1]. Now if in the future you come across a new example, the price of which has gone far beyond the prices during the training, normalization will return it to the range [0...1] and it will not be some unknown region of data for the neural network. The trained model may well find and recognize some rules like "the price over the last N bars was at least once higher than 0.8" and this rule will apply to any new data, even if the price has fallen by half.

If you normalize each column separately, as you usually do, the results of the model in the fronttest will be worse. The dependencies between the same-type predictors within one training example are lost. (for example Open[1], Open[2], Open[3]... from the same training example will be scanned in different intervals, according to all training examples). There is also a problem that normalization will be done during training based on thousands of training examples, while in real trading we will have only one line that should be scanned in waves with itself, which is hard to understand and strange.

If you don't do normalization at all, you can get away with it. But if the price falls/rises outside of the interval that was available during training - it will be a new and completely unknown region of data for the model. It won't pull it out and will crash.

All this is strictly for neural network, from experience.

 

Worked a little more with y-scale pca. Found a good article with formulas of dispersionshttp://www.chemometrics.ru/materials/textbooks/pca.htm, made such a thing in R.

The basic code is from the same articlehttp://www.r-bloggers.com/principal-components-regression-pt-2-y-aware-methods/
You need to run the code"princ <- prcomp(dmTrain, center = FALSE, scale. = FALSE)" and stop. next:

extractProjection <- function(ndim,princ) {
  return(princ$rotation[,1:ndim]) #Эта функция уже была определена ранее, но в этот раз нельзя менять знаки в матрицах как раньше
}
countOfComponentsToUse <- ncol(princ$x) # сюда можно в цикле подставлять числа от 2 до ncol(princ$x), увеличивая число компонент до необходимой точности
PCnameList <- colnames(princ$x)[1:countOfComponentsToUse]
proj <- extractProjection(countOfComponentsToUse, princ)
residualsMatrix <- dmTrain - ((princ$x[,1:countOfComponentsToUse]) %*% (t(proj)))
V0 <- sum(residualsMatrix*residualsMatrix)/nrow(dmTrain)
TRV <- V0/ncol(dmTrain)                           #полная дисперсия остатков (средний квадрат ошибок)
ERV <- 1-nrow(dmTrain)*V0/sum(dmTrain*dmTrain)    #объяснённая дисперсия остатков, нужно увеличивать число компонент если ERV < 0.95

The bottom line is approximately the following: the object princ contains a matrix of bills (princ$x) and a matrix of loads (princ$rotation). If we multiply these two matrices princ$x %*% t(princ$rotation), we get back the original table with dmTrain data (scaled by Y, no target variable).

You can limit the number of principal components, and then the raw data will be determined with some error. If you remember slow internet and special jpeg pictures that when loaded appear and get sharpness, then this is something similar, the greater number of principal components we take, the more accurate match to the raw data we get.

For the number of components N, the initial data is defined as follows: princ$x[,1:N] %*% t(princ$rotation[,1:N]). The error can be calculated by subtracting the resulting matrix from dmTrain, you get the residualsMatrix. These are errors, how much the real data differs from the found data. TRV is the average of the error squares. ERV is something like the average error. With ERV=0.8, the original data will differ from the found ones by ~20% in each cell of the table. That is, if you found the number 10, then the original data was most likely from 8 to 12. This is a very rough definition, but it makes more sense. And ERV should be at least 0.95 for PCA model to be considered to contain enough components.

What does it give? You can add new parameter tol to princ <- prcomp(dmTrain, center = FALSE, scale. = FALSE, tol=0.01), to avoid generating thousands of principal components, and the generation will stop when sdev of the new component will be < sdev(PC1) * tol. After that, the number of used components can be picked up with the function above, starting with 2 and gradually increasing by 1.

Tried to think how to apply it for sifting predictors, nothing comes to mind so far. For example in princ$rotation change load to 0 for one predictor, recalculate ERV and see how much worse the result will be. True, I do not see the point in this action, it is not clear how to apply this result. Maybe so you can find correlated predictors that do not carry any new information, and so their elimination may not worsen the result. I would like to find noise predictors, but in these matrices there is no connection to target variables, i.e. there is no criterion at all for what is noise and what is not.

 
Dr.Trader:

Worked a little more with y-scale pca. Found a good article with dispersion formulashttp://www.chemometrics.ru/materials/textbooks/pca.htm, made this in R.

Basic code from the same articlehttp://www.r-bloggers.com/principal-components-regression-pt-2-y-aware-methods/

Revisited the article. It seems that you are not trying to repeat the example in the article, just point by point, replacing it with your data. Or did I miss something? If not, why not?
 

I repeated the example. Replaced it with my own data, got 100+ principal components for variance 0.95. I looked at the graphs of loadings, clearly good predictors did not stand out. I.e. while author sees on his data that he can keep 2 principal components and 5 predictors - I see 100+ components and 1000+ predictors on my data (lowadings smoothly decreasing, it's not even clear what threshold value of lowadings should be cut out).

Although I have to hand it to y-aware pca, I just substituted my data without pre-screening, built Y-Aware PCA model on it, and got 45% error in fronttest. It's not profitable for forex yet, but the article ends there, so if I am to use y-aware pca then I need to come up with something else.

Other ways I can leave only a dozen predictors, train nnet, and get fronttest error of only 30%. I would like to get a similar result with y-aware pca.
 
Dr.Trader:

I repeated the example. Replaced it with my own data, got 100+ principal components for variance 0.95. I looked at the graphs of loadings, clearly good predictors do not stand out. I.e. while author sees on his data that he can keep 2 principal components and 5 predictors - I see 100+ components and 1000+ predictors on my data (lowadings smoothly decreasing, it's not even clear what threshold value of lowadings should be cut out).

Although I have to hand it to y-aware pca, I just substituted my data without pre-screening, built Y-Aware PCA model on it, and got 45% error in fronttest. For forex it's not profitable yet, but the article ends there, so if I use y-aware pca then I need to come up with something else.

Other ways I can leave only a dozen predictors, train nnet, and get a fronttest error of only 30%. I would like to get a similar result with y-aware pca.
It seems to me that we need to repeat letter by letter. There are packages used except pca, which I did not see from you.
 

https://c.mql5.com/3/97/Principal_Components_Regression__1.zip

Here is the R code from the article, files _03.txt and _04.txt, I've done it all on my data. I even added fronttest data check to _04.txt. The only difference I see is the etal package offered at the end of the article. But there are not even any examples, they just suggest to try and compare the result with what vtreat prune does.

 
Dr.Trader:

https://c.mql5.com/3/97/Principal_Components_Regression__1.zip

Here is the R code from the article, files _03.txt and _04.txt, I've done it all on my data. I even added fronttest data check to _04.txt. The only difference I see is the etal package offered at the end of the article. But there are not even any examples, they just offer to try it and compare the result with what vtreat prune does.

It looks pretty solid.

So no useful result?

 
Методические заметки об отборе информативных признаков (feature selection)
Методические заметки об отборе информативных признаков (feature selection)
  • habrahabr.ru
Всем привет! Меня зовут Алексей. Я Data Scientist в компании Align Technology. В этом материале я расскажу вам о подходах к feature selection, которые мы практикуем в ходе экспериментов по анализу данных. В нашей компании статистики и инженеры machine learning анализируют большие объемы клинической информации, связанные с лечением пациентов...
 

Hello!

I have an idea, I want to check it, but I do not know the tool for implementation... I need an algorithm that would be able to predict for a few points ahead, say for 3 or 5 (preferably a neural network)

I've worked only with classifications before so I don't even understand how it should look like, advise someone how to do it or recommend a package in R

p.s. Great article Alexey

Reason: