Machine learning in trading: theory, models, practice and algo-trading - page 645

 
SanSanych Fomenko:

It makes no sense to conduct tests yf the initial quote, because it is obvious by eye that the series is not stationary.

And interesting (not for me - I always use it) are the graphs for time series log(p/p-1).

What's in there? And of course you need a scale on the ordinate axis.

I didn't use the scale to fit two graphs in one frame, to save space, but the Y coordinates were originally different.

The result is totally different than last time, here are the most interesting graphs. The rest are in the archive, so I don't have to look at 10 pictures to get them here. But the entropy graph is not interesting at all.

Atacha script, in R-Studio you can scroll back and forth through history of all plot plots

Oops, typo in the code again, reattached the .txt file

 
SanSanych Fomenko:


We discussed the main components and you saw the flaw that the algorithm is without a teacher.

Here's one with a teacher:

Package spls.

Thanks, from the cran description I wouldn't have even guessed (Sparse Partial Least Squares (SPLS) Regression and Classification)

 
Dr. Trader:

I did not use the scale to fit two graphs in one frame, to save space, but the Y coordinates were originally different.

The result is very different from the last time, here are the most interesting graphs, the rest are in the archive, so as not to be able to fit 10 pictures here. But the entropy graph is not interesting at all.

Atacha script, in R-Studio you can scroll back and forth through history of all plot plots

Oops, typo in the code again, reattached the .txt file.

Great pics!

You can tell from the arch-test that there are plots where arima models work. But there is always one problem: we are all very smart about history and we learn that we can use arima only after we have passed it! And so with all our theories: strong backward thinking.

 
Dr. Trader:

To follow up on this -https://www.mql5.com/ru/forum/86386/page643#comment_6472393


The function for sifting predictors random.forest.importance() showed quite decent results on some tests. It is inconvenient that in its opinion all predictors are at least a little bit important... but if, for example, you calculate the average importance and take only those predictors that are important above the average - you get very good results.

What Importance? Gini or Permutation (MDA)

P.s. there are other methods, maybe you can try to compare them too http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/

Selecting good features – Part IV: stability selection, RFE and everything side by side
  • 2014.12.20
  • blog.datadive.net
In this post, I’ll look at two other methods: stability selection and recursive feature elimination (RFE), which can both considered wrapper methods. They both build on top of other (model based) selection methods such as regression or SVM, building models on different subsets of data and extracting the ranking from the aggregates. As a wrap-up...
 
Dr. Trader:

I found another interesting package for sifting predictors. It is called FSelector. It offers about a dozen methods for sifting predictors, including using entropy.

I took the file with predictors and the target from here -https://www.mql5.com/ru/forum/86386/page6#comment_2534058


The estimation of the predictor by each method I showed on the graph at the end.

Blue is good, red is bad (for corrplot the results were scaled to [-1:1], for an accurate assessment see results of calls to the functions cfs(targetFormula, trainTable), chi.squared(targetFormula, trainTable), etc.)
You can see that X3, X4, X5, X19, X20 are well evaluated with almost all methods, you can start with them, then try to add/remove more.

However, the models in rattle did not pass the test with these 5 predictors on Rat_DF2, again the miracle did not happen. I.e. even with the remaining predictors, you have to tweak model parameters, do crossvalidation, add/remove predictors yourself.

FSelector comes from WEKA, it means it uses Java. It consumes a lot of memory. It is better to use FSelectorRcpp.

Good luck

 

Here's more entropy(price) and archTest(log(p/p-1)) at the same time. To the eye they do not seem to correlate, I do not see any signals. Who has an eye for indicators - may notice something.


 
Maxim Dmitrievsky:

which Importance is it? Gini or Permutation (MDA)

There are 2 types to choose from -
1=mean decrease in accuracy (this is probably mda, it matches the first letters)
2=mean decrease in node impurity

 
Dr. Trader:

There are 2 types to choose from -
1=mean decrease in accuracy (that's probably what mda is, match the first letters)
2=mean decrease in node impurity

Yep, that's it, thanks, the second mdi.

 
Dr. Trader:

Here's more entropy(price) and archTest(log(p/p-1)) at the same time. To the eye they do not seem to correlate, I do not see any signals. Who has an eye for indicators - may notice something.


It's a simple volatility indicator.)

But the arch test doesn't show anything

 

I see an undoubted interest in evaluating the importance of predictors.

The most diverse system is in the CORElearn package (at one time it was strongly recommended to meby Vladimir Perervenko)

It has several functions for evaluation.

On the first stage it is a function:

ordEval(formula, data, file=NULL, rndFile=NULL,
variant=c("allNear","attrDist1","classDist1"), ...)

ordEval вычисляет результирующие вероятностные факторы, соответствующие эффекту увеличение/уменьшение значимости атрибута для класса.
Алгоритм оценивает строго зависимые упорядоченные атрибуты, в которых значения отдельных атрибутов зависят от других атрибутов в разной манере.

At the second stage the function

attrEval(formula, data, estimator, costMatrix = NULL, ...)

estimator       Имя метода оценки. Ниже 37 имен.

[1]     "ReliefFequalK"      "  ReliefFexpRank" "ReliefFbestK"  "Relief"
[5]     "InfGain"            "GainRatio"        "MDL"            "Gini"
[9]     "MyopicReliefF"      "Accuracy"         "ReliefFmerit"  "ReliefFdistance"
[13]    "ReliefFsqrDistance"    "DKM"           "ReliefFexpC"   "ReliefFavgC"
[17]    "ReliefFpe"          "ReliefFpa"        "ReliefFsmp"    "GainRatioCost"
[21]    "DKMcost"            "ReliefKukar"      "MDLsmp"        "ImpurityEuclid"
[25]    "ImpurityHellinger"     "UniformDKM"    "UniformGini"   "UniformInf"
[29]    "UniformAccuracy"       "EqualDKM"      "EqualGini"     "EqualInf"
[33]    "EqualHellinger"        "DistHellinger" "DistAUC"       "DistAngle"
[37]    "DistEuclid"                     


Дополнительный параметр costMatrix может включить неоднородную матрицу стоимости для классификаций, чувствительных к стоимости мер 
(ReliefFexpC, ReliefFavgC, ReliefFpe, ReliefFpa, ReliefFsmp, GainRatioCost, DKMcost, ReliefKukar и MDLsmp). 



As you can see, there is plenty of room for exercises in determining the importance of predictors.

Reason: