Machine learning in trading: theory, models, practice and algo-trading - page 39

 
Yuri, I was able to train vmr model, but I can not predict new data on it, please advise what is wrong? I choose Load model menu, choose train.vmr file from the atachment, then click Use model. I see the text "Enter data and press "OK" in the log. But that's it, no matter what data I enter, I still see a grey inactive Ok button, no feedback. Train.csv and test.csv files are also in atache, those are the ones you posted earlier.
Files:
train.zip  38 kb
 
Dr.Trader:
Yuri, I was able to train a vmr model, but I cannot predict new data on it, please tell me what is wrong? I choose Load model menu, choose train.vmr file from the atachment, then click Use model. I see the text "Enter data and press "OK" in the log. But that's it, no matter what data I enter, I still see a grey inactive Ok button, no feedback. Files train.csv and test.csv also in atache, are those that you posted earlier.

Probably entered something non-numeric? This triggers an exception handler and deactivates the OK button. In that case you need to restart jPrediction.

In the near future I will deal with this bug, so that at least issued some warning messages that the user entered nonnumeric data.

P/S I already fixed it.

For example, let's enter symbol "z" instead of number. As a result you will get an error message:

To activate OK button now you should choose menu item File>Use model:


I uploaded the corrected version to my site

 

I figured it out, the error was in the wrong data format. I formatted the data all as you have in the pdf - added the second line with an explanation, the first column with indexes, and erased the name of the column in the predictor. Everything worked. The model is trained in both cases, both on the original data and on the formatted data. But the prediction works only if I feed specially formatted data for training.

Unfortunately your model did not pass the test on your own data. You have previously posted train.csv and test.csv files and you had good prediction rates. I checked, the prediction error is 50%, no good performance at all.

Steps for the test:

1) Format the train.csv file as written above. Then physically divide the file into two - train_part1.csv and train_part2.csv. The second file contains the last 20 lines of train.csv. The first file contains everything else except the last 20 lines, so the data in the two files don't overlap.

2) Train the model on train_part1.csv, switch to forecast mode. Feed alternately lines from train_part2.csv for prediction. I got the right answer only 9 times out of 20, no miracle.

I don't understand why if your model itself divides the original training file into two parts for training and validation, then does an out of sample test, and ends up with prediction accuracy of 100? And if you physically divide the training file, and do an out of sample test manually, then the prediction is no better than flipping a coin? If I feed any samples from the training sample for the prediction, then the prediction works correctly, that is, the prediction function seems to be fine. This is all very bad and wrong, you have some serious errors in your code.

Files:
vmr_test.zip  44 kb
 
SanSanych Fomenko:

Only a very superficial acquaintance with R will allow you to talk about "nags".

Of course, we put R and see a character string interpreter. If you go deeper, you can see the bytecode, but this does not solve any problem of the interpreter in terms of efficiency. There is nothing to even discuss - nag.

But if you look a bit deeper into R packages, you quickly find out that what we see code in R is a reference to another code. And if you start looking into it, it turns out that for computationally intensive algorithms R always uses third-party packages, which were chosen on the principle of maximum efficiency. These are usually C or Fortran libraries.

Or, for example, matrix operations. Considering that R has no notion of scalar, and everything starts with vectors and matrix arithmetic is completely natural for R, the question of using an appropriate library that is written NOT in R is a matter of principle. Intel Math Kernel Library is used.

To add to this, paralleling calculations not only to all the cores of one's own computer, but also to neighboring computers, is a common operation in R.

So, what is a "nag" and what is not is a big question.

PS.

You don't have to port anything to R, you just have to learn the basics. R has everything you need and a lot more than that.

I support SanSanych. Everything you need for any of your ideas is already in R/.

Reshetov's arrogant remark is not surprising. It is such a worldview.

There is no need to change your mind. It makes no sense.

 
Dr.Trader:

I don't understand why if your model itself divides the original training file into two parts for training and validation, then does an out of sample test, and ends up with a prediction accuracy of 100? And if you physically divide the training file, and do an out of sample test manually, then the prediction is no better than flipping a coin? If I feed any samples from the training sample for the prediction, then the prediction works correctly, that is, the prediction function seems to be fine. This is all very bad and wrong, you have some serious errors in your code.

You are right, there is an error in my code. I will correct it.
 
Vladimir Perervenko:

I support SanSanych. Everything you need for any of your ideas is already in the R/.

I'll argue here, if I may, the nag is meant in terms of speed and I absolutely agree with Yuri...

Yes most likely you'll find a ready solution for R almost any problem, but sometimes you are stuck with stupid speed

I remember, I was doing a very hard search for many parameters through the correlation function, at first I used the built-in R function (by the way, it was written in C++), but it was overloaded with different methods, the calculation of one round of my cycle took about 3.9 minutes or 230 seconds, that time was not acceptable, the second step was to write my own function in R without everything else, my function already worked at a speed of 30 seconds, but it absolutely did not suit me, since R was my first and only language, I asked my friend to write for me a corr function so let's compare

standard R correlation function - 230 sec.

self-written R correlation function - 30 sec

function written in C++ - 0,33 sec

So yes Yuri is right, R is a mug in a context where he meant it, but this problem is solved, and I think R is far away from all languages in terms of convenience and code-writing speed because you almost don't have to write anything yourself, everything is ready-made, that's why I like R ...

 
mytarmailS:

Here I will argue, if I may, nag meant in terms of speed and I absolutely agree with Yuri...

Yes most likely you can find for R a ready-made solution to almost any problem, but sometimes you are stuck with the stupid speed

I remember, I was doing a very hard search for many parameters through the correlation function, at first I used the built-in R function (by the way, it was written in C++), but it was overloaded with different methods, the calculation of one round of my cycle took about 3.9 minutes or 230 seconds, that time was not acceptable, the second step was to write my own function in R without everything else, my function already worked at a speed of 30 seconds, but it absolutely did not suit me, since R was my first and only language, I asked my friend to write for me a corr function so let's compare

standard R correlation function - 230 sec.

self-written R correlation function - 30 sec

function written in C++ - 0,33 sec

So yes Yuri is right, R is a nag in a context in which he meant it, but this problem is solved, and as for convenience and speed of writing code, I think R is far away from all languages as I almost don't have to write anything myself, everything is ready-made, that's why I like R ...

So what does this have to do with R?

Once again, R has everything you need to implement your ideas, including the speed.

Another thing is that not everyone and not everything knows how to use them.

But this is not a problem of the language.

By the way, the convenience and ease of inline functions in Cp is amazing.

Good luck

 
mytarmailS:

Here I will argue, if I may, nag meant in terms of speed and I absolutely agree with Yuri...

Yes most likely you can find for R a ready-made solution to almost any problem, but sometimes you are stuck with the stupid speed

I remember, I was doing a very hard search for many parameters through the correlation function, at first I used the built-in R function (by the way, it was written in C++), but it was overloaded with different methods, the calculation of one round of my cycle took about 3.9 minutes or 230 seconds, that time was not acceptable, the second step was to write my own function in R without everything else, my function already worked at a speed of 30 seconds, but it absolutely did not suit me, since R was my first and only language, I asked my friend to write for me a corr function so let's compare

standard R correlation function - 230 sec.

self-written R correlation function - 30 sec

function written in C++ - 0,33 sec

So yes Yuri is right, R is a mug in a context in which he meant it, but this problem is solved, and as for convenience and speed of writing code, I think R is far away from all languages as I almost don't have to write anything, everything is ready-made, that's why I like R ...

Let's not replace general questions with a particular example.

If we are talking in general about efficiency of code in the R programming SYSTEM, I wrote about the basics of efficiency. In contrast to most programming systems computationally intensive algorithms in R are used indirectly and are used at most efficient in programming in general, inside some content functions, though there is a possibility to use them directly, e.g. optimization or GA.

The clearest example of such efficiency is matrix operations, which are among the most computationally capacious operations. A programmer may not be aware of the library used at all, since a matrix operation is just an ordinary line of code.

Besides, what I didn't write about in my post about efficiency is what you wrote about. It's the ability to write a cp piece of code, and it should be emphasized that the structure of R is such that this insertion will be an organic addition to the main code. And the example you gave is very typical of R.

 
Okay, I propose to close the topic of efficiency R, otherwise we are already beginning to repeat the obvious, and the meaning turns out "oil is butter, the sky is blue, and the grass is green.
 

Hello!

I found a package with recurrent neural network https://cran.r-project.org/web/packages/rnn/rnn.pdf, it was interesting to check it on my data (maybe someone else would be interested in checking ;) ) But I ran into something that I have never yet encountered, since the recurrent network, the data are fed to it in a special way in the form of a 3D array, despite the fact that there is an example, I still can not understand how it works

how the code should look in the "X" variable, if I have not 2 predictors but 100, here is a piece of the example:

#  create training numbers
X1 = sample(0:127, 7000, replace=TRUE)
X2 = sample(0:127, 7000, replace=TRUE)
#  create training response numbers
Y <- X1 + X2
#  convert to binary
X1 <- int2bin(X1)
X2 <- int2bin(X2)
Y <- int2bin(Y)
#  Create 3 d array: dim 1: samples; dim 2: time; dim 3: variables.
X <- array( c(X1,X2), dim=c(dim(X1),2) )
Reason: