Machine learning in trading: theory, models, practice and algo-trading - page 694

 

I had a long post, but the moderator deleted it because I could not contain myself and expressed my feelings in the typical Russian manner .... so I will be brief....

I just OWLED, to be honest with you from the results, which I got. Yes, from 110 inputs were chosen from 3 to 5 in different circumstances, and sootvetechno models that have turned out to be fairly small, but how they have worked is something with something. First of all, ALL models built on the selected inputs passed my test at 100%. Such result I could get only once, and that by accident, and here is a stable everything .........

Yes, the polynomials themselves were too small, but who the fuck cares. All means are good in war. I know we will hit the market with small polynomials, if they work, it's not my fault......

But again, this is all just a quick test and more time is needed to confirm the theory, but here is a vivid example of the TS working in the area of EP for the last two weeks.

It certainly is not a grail discussed in a neighboring thread, but the result is obvious, Equity is consistently above zero on the whole block of CB!!!!

I want to thank all who responded to my request for help with R. You really helped me to save a lot of time. Now I just need to get my signal into the green zone!!!!!

 
Dr. Trader:

For example, jPrediction scales the data to the interval [-1;1], and learns on these numbers. You can also scale to the same interval in R before evaluating the inputs.

Methods of estimation based on trees probably won't change the result, forests don't really care in what interval the data came, but it's better to check. vtreat is not picky about the interval either.


But in general, he is talking about non-linear transformation of inputs even before feeding into neuronics. Neuronics is very sensitive to inputs, and if you process the input data in some special way, its results may improve. For example, I heard such a trick - convert inputs via sigmoid.

Built models already have such a function, if I'm not mistaken. It looks like this....

double sigmoid(double x) {
   if (MathAbs(x) < 1.0) {
      return 2.0 * signum(x) - x;
   }
   return signum(x);
}
 
Mihail Marchukajtes:

It looks like this....

Some kind of special "Reshetov Sigmoid" :)

Here is an interesting picture with different data transformations.
They say that if all predictors have approximately the same bars the neural network will be very happy and will learn easily. The last third of predictors on scale()x2->Sigmoid() looks nice, but we should do something with the first half of predictors or the neuron will choke.
And if you look at boxplot for scaling in [-1;1] like in jPrediction, it looks really bad.


Files:
 

Today is grail day, but we know what it looks like and how much work it takes to hold it in our hands!!!!

I'm not afraid of this word, but today I found the grail for myself. I did a lot of tests and the results were amazing. Special thanks to Dr. Trader for his support, which actually led to the discovery. I'm not afraid of this word........ With R I was able to effectively find a set of important predictors, and given that the target has the same number of classes, then playing it up a little (adding or removing one) set of important predictors can be expanded by one, two columns. I tried it once and it was just fine to add them. Then we start to tune in and select the model with maximal values of the training result.


Confused by the size of the polynomial is not great, but it will work in theory 50% of the training interval, that is a week, and it's enough for me!!!!!! But here's the thing.... And I now turn here to those people who are looking for reliable and stable patterns. It's easier to explain with an example.........

I keep a data table of 1000 rows and 111 columns where 110 predictors and so output. BUT I don't take the whole table, but take a small fresh section of 40 records (this is 2 weeks of TS work approximately) As a result I have a training set of size 40 by 110 plus the target. In fact, I take a slice of the market on this particular day at this particular interval. This slice is stationary. Then I select significant input variables in relation to the output and I get from 3 to 5 columns, which, I understand, has the proverbial alpha allowing me to have an advantage over other market participants. And now the most important thing.... What was the point of all this discussion. As soon as I add one more line to the data table for training, the set of columns will change dramatically, that is, the alpha will run into another set of columns. Maybe not immediately, but after adding not one, but several rows. I.e. signals TS!!!! Alpha is exactly the same pattern in its purest form, which is minimal and sufficient for the target function. But this pattern is not explicit, that is, to see it with the naked eye is extremely difficult. It is at this stage and joins the AI and does its job.

And now imagine how alpha can jump on all data field, which I unload, if it is rarely contained in more than five inputs, and the total field of 110 inputs. In other words, with each new slice I get a completely different set of predictors. And how do you want to keep up with it, and even at a distance of a YEAR!!!!!!! if it is here for weeks, you can hardly catch it normally....... But you are absolutely right, the Grail exists, but everyone has his own, and in order to keep it you need to make a couple of considerable efforts.......

And again, referring to the theoreticians of demo accounts, this is how it is done.......

I've been working on the theory and did some tests with it. The tests showed good results. Models are trained UPU with the robot is loaded. Watch my signal this week and you will see what my assumptions are worth.

 

Wonderful!

Glory to R!

 
Mihail Marchukajtes:

Now imagine how the alpha can jump on the whole data field that I unload, if it is rarely contained in more than five inputs, and the total field of 110 inputs. In other words, with each new slice I get a completely different set of predictors.

Now turn on your brain )

 
Combinator:

Now use your brain.)

If you really follow your advice and think about it, there is only one answer. The cut is stationary and any change of it simply throws you into another dimension (figuratively speaking), where the laws are completely different. That's why it's very difficult to make money in the market. This is what is called non-stationarity. When the information that can predict only 5% of the maximum possible data set and the next signal may change drastically or not at all. And I save the delta and volume for 11 instruments, of which only 5% work here and now and it is not known when they will be changed, but it is clear that when the substitution of other columns this moment can be tracked and consequently determine the moment when the model is deflated...... I need to do more tests..... and I don't have time...

 

OOS, 15 min tf

found some bugs in strategy #2, fixed it, it seems to be working

There is still page #3 and RL added, which I feel has a lot of potential, but I will have to think a lot about the implementation


 

Interesting article on a study of eight machine learning models

Specifically, we consider the following algorithms: multilayer perceptron (MLP), logistic regression, naïve Bayes, knearest neighbors, decision trees, random forests, and gradient-boosting trees. These models are applied to time series from eight data generating processes (DGPs) - reflecting different linear and nonlinear dependencies (base case). Additional complexity is introduced by adding discontinuities and varying degrees of noise.


Here are the results

First, we find machine learning models to achieve solid performance on unknown underlying DGPs, compared to the ambition level set by optimal forecasts. In absence of noise (base case), the results achieved with the machine learning models almost resemble those of the optimal forecast. Model-wise, MLPs and GBTs provide the best results for both, linear and nonlinear DGPs. For processes with no or only small nonlinearities, the LR presents a good alternative, especially when considering its low computational cost. We find NB and single decision trees to deliver worse performance and hence recommend the aforementioned techniques for time series prediction tasks.

Second, it is better to include too many lagged values in the feature space than to include too few. We find most machine learning models to be fairly robust to exceeding the number of required lags suggested by the process equation of the DGP. In case of the RF, the inclusion of additional features even increases the predictive accuracy. We recommend to start with one lag and to gradually increase the number of lags monitoring the performance on a hold out set or with cross validation.

Third, we find jumps to have a very strong negative effect on predictive accuracy with LR being the most robust machine learning model. To mitigate the negative effects, both adding first differences to the features space (DIFF) as well as removing of jumps based on the LOF algoirthm have shown good results. We recommend the combination of both techniques.

Fourth, polluting the time series with noise has the most detrimental effect on predictive accuracy across all machine learning models. Again, we find that the LR is the most robust machine learning model in the presence of noise. Moreover, additional mitigation measures, such as the inclusion of first differences (DIFF) and moving averages (MA) in the feature space yield improved results.

 
SanSanych Fomenko:

Interesting article on a study of eight machine learning models

Model-wise, MLPs and GBTs provide the best results for both, linear and nonlinear DGPs. For processes with no or only small nonlinearities, the LR presents a good alternative, especially when considering its low computational cost. We find NB and single decision trees to deliver worse performance and hence recommend the aforementioned techniques for time series prediction tasks.

Just like Captain Obviousness, considering that CART doesn't work on pre-linear problems at all

Reason: