Yur, make a normal version of the program (or upgraded) for people)))
1.Let it read the standard files of this type - header (with or without quotation marks), data, delimiters let it read zpt, point.zpt, tabulation. Example -
"Date";"V1";"A77";"V23A";"Целевая"
01.01.2000;4.999995E-03;1.499891E-03;-2.000213E-03;-5.000234E-03;1
2.Let the model uses only 10 inputs, but make it possible to load a lot, then
in a separate tab, or you can set "wings" (you choose the inputs) or else.
3. without normalization! (or disabled), because if the data is initially centered on 0, after conversion
x-x.min/....(or whatever you have) - fly away)))
4.Throw out randomness (or switchable), let it just divide the data in two parts, counts in order and displays the % of successful predictions for 1 (for 0 do not)
5.Made the importance of inputs - good!
6.Let in addition to the formula that is output, output the full!, that is now
double x2 = 2.0 * (v5 + 1.0) / 2.0 - 1.0;
double decision = -0.2632437547312642 -0.2634178652535958 * x2
+ 0.05267978803936412
we should -
Name target = -0.2632437547312642 -0.2634178652535958 * insert construct x2= (2.0 * (v5 + 1.0) / 2.0 - 1.0)
+ 0.05267978803936412
you get -
Name target = -0.2632437547312642 -0.2634178652535958 * (2.0 * (v5 + 1.0) / 2.0 - 1.0)
+ 0.05267978803936412
That is, you will get a little bit suitable device for quick research. Then you can still insert and so on, simple (not greedy)
algorithms, with a choice... Otherwise the tool is of little use. While the file is being edited, this and that... It's easier to use something else...
Where's version 7?
So-so - just not enough brains and time.
You have to start with the target variable, and then select predictors for it, and then double-check with mathematics, so on. In any case, the process is tedious and I can't formalize it.
4.Throw out randomness(or disabled), let it just divide the data in two parts, count in order and output % of successful predictions for 1 (no need for 0)
If trying to use jPrediction for forex, this is probably the main thing to add. There are many models that can split the data randomly into two parts, then train on the first, and show good results on the second part as well. But most of these models will be ineffective on new data. There are no constant dependencies in Forex, good results on a random test sample do not guarantee good results on new data. The only way to make sure the model is suitable for forex is to roll forward test:
The model should show good predictive ability on data marked in red. Now in jPrediction such test has to be done manually, entering data anew for each test case, it's not good for serious work.
And I agree that it's better to show a score for test data only than a total for training+test.
All the grades that jPrediction produces now are inadequate for forex, and only misleading.
By the way, here's how the roll forward is done in the caret package for R:
http://topepo.github.io/caret/splitting.html(Data Splitting for Time Series section)
The model will be trained on the first 1000 examples, then tested on examples 1001-1300. Then a shift to 300, training at 301-1300, test at 1301-1600, etc. will be done until you run out of training examples
If someone doesn't like something in jPrediction, the project is open with Open Source under GNU GPL license and making forks from it, changing and modifying something there, is not only not forbidden, but to some extent even welcomed.
All of the estimates that jPrediction gives out now are inadequate for forex, and only misleading.
These are unsubstantiated unsubstantiated statements, i.e. what is commonly referred to as bullshit. To prove the effectiveness of Walk Forward, be kind enough to present the results of comparative studies of models obtained in jPrediction and models obtained after Walk Forward, in such a way that they can be cross-checked for "lousiness". If such results confirm your words, then it would make sense to replace the testing algorithm currently implemented in jPrediction with Walk Forward.
I have repeatedly encountered the fact that seemingly "obvious" ideas turned out to be "empty" after being implemented and tested by experience. Only a very small part of ideas is effective, and often only after additional editing "file". Because in ideas the details are missing. And the devil is in the details.
Linus Torvald (creator of the Linux kernel) never gets tired of saying that "theory meets practice sooner or later. And practice is always the criterion of truth. Always!".
An additional problem of Walk Forward is that we get a lot of models at each stage. Which of these models should be kept as a working model and by what criteria should it be selected?
There is no need to quarrel. We are working on a very complex object of study. There are a lot of assumptions. And those assumptions lead to over-learning and out-of-sample plummeting.
Two simple rules: fool the market, don't fool yourself.
Walk Forward is a good method, of course. But the main parameters there are the depth of training and the test length. Moreover (!!!) if these parameters are adjusted in such a way that the result on the forward is the best, we will overtrain Walk Forward! Therefore, even for such a good method we need out-of-sample data. That is, on one large chunk we repeatedly do Walk Forward optimization. On another non-overlapping chunk, we try the best Walk Forward parameters on "training" and do another full wolf forward, but do it once. If the result is good, then the model picks up the dependencies. If the result is bad, then we just retrained Walk Forward on a not very good model.
We have the same problem when testing once on a delayed sample. I'm working on it now: releasing my results from overtraining. The results on crossvalidation should correlate with the deferred sample for final testing. That said, selecting the best model by crossvalidation will give the approximate best out-of-sample results. Otherwise - if there is no correlation or negative correlation - we have an inadequate model for forex that needs to be changed.
