Machine learning in trading: theory, models, practice and algo-trading - page 890

 
Aleksey Vyazmikin:

Here's the first test subject ready - training 2015-2016, and from 2017 pure trading on selected tree rules - didn't lose - already good?

Against trading without NS - training (ugh - tuning and optimization) 2016-2017


Still don't understand the best way to do it - I ended up selecting rules and turning them into code - very painstaking manual work... Need some sort of automation of the process.


The error from your 10% in training immediately became 50% in the future.

This is either overlearning or peeking into the future by predictors.
 
Maxim Dmitrievsky:

you already have it working without a tree) try to add the optimizer from my article on the forest, maybe the results will improve

I have a code if you need it.

I had a look at two articles, I do not understand, what do you mean by optimizer on scaffold?

What kind of mutual information - between the net and the EA without the net or between the buy and sell signals? I need the code, but I don't know if I can figure it out.

 
elibrarius:

The error from your 10% in training immediately became 50% in the future.

This is either over-learning or peeking predictors into the future.

Not really, I hand-picked only 25 rules for buying and 16 rules for selling, it is probably less than 0.1% of all rules. I just wrote above that the problem is, in my opinion, too many rules, which is not effective.

There's no peeking (not in logic, well, if there's an error in the code, but two codes are used - one to take information out of the script, the other to work on the rules as an indicator, i.e. the probability of error is less).

Overtraining - yes maybe so, in general, if globally, my chips are from the area of "follow the trend", made for trend finding, and 2017 on Si was almost all flat, without global trends - somewhat different market.

On the other hand, I collect chips from different TFs, I get some kind of classification of more to less, and it looks like an inverted pyramid, or zooming, i.e. I conditionally divide a month into two parts with subsets, look at the same week, day, hour in each subset... and here we gathered statistics that, with the other chips, turned out to be repeatable on the sample.

Buying rules.


Blue - this is finding the price in the Donchian channel at the time of the decision - from 0 - 10 - 10% step - it is suggested to buy when the price rises, which is reasonable in general.

Green - just a large scale of the planned ATR area day, week, month - i.e. a large trend, there is a breakdown from -8 level to +8 level, for example it is visible that at oversold on the monthly TF - level -6 - only 1 rule to buy, while it is supposed to grow from -3, -1, -2, -4 level.I.e., there is probably a lot of emphasis on the fact that the Dollar-for-Ruble futures were mostly rising more than falling month-over-month, and there were flips within the bar (reversal of the opening price after a strong move to one side).

Grey(?) - RSI on the hour - it is recommended to buy outside the 70 levels (only 1 time it is recommended to buy behind the 70 level).

Orange (according to the office) - BB_Up - it is price above the upper limit of Bollinger at the opening of a new bar - 6 out of 25 prefer outbought in the moment as a signal for entry, but the remaining 19 prefer outbought in the moment, and judging by BB_Down - calm - shelf or flat.

The yellow TimeH - here the preference to enter at 10 o'clock (4 of 13) - i.e. immediately at the opening and at closing - at 23 o'clock (2 of 13), and it's not surprising, because at 10 o'clock sharp and strong movement is provided, the rest 12,15,13,17 - normal daily session with good volatility, and 20 hours is rather an exception to the rule. Maybe, if I add days of the week, there will be some regularities, connected with the weekly news - oil reserves and their forecasts are actual for the ruble - I`ll try.

 

Wanted to keep my lunch of silence until I get some good statistics, but I can't watch you persist in being wrong....

Any conversion leads to a decrease in information about the series you are looking for. Even Mashka with parameter 2 starts to lag and loses a small amount of cotier information in the process. NS is such a delicate tool working with real numbers, where any digit, even at 10 decimal places, can be decisive in the final solution. You completely cut off the real part of the number by categorizing your inputs from -1 to 30 (as an example), and you get 31 categories. In the real world, the number of choices between -1 and 30 is exactly as many orders of magnitude as you take those orders after the decimal point. As a result, if you take int, you have 31 choices of division, and if you take doble, there are considerably more choices of division.

If you use categorical input itnt from -1 to 30, the quality of the data itself should be very high so that the network could be trained on them and would get a good result, and since ALL your data is built from price their quality is highly questionable, and you also cut the real numbers in the int, thereby killing the possibility of NS to catch on at least something.

You can use categories on the input if the quality of used data is high enough as it is. Which basically disables the use of NS in principle. Having good categorical predictors it is possible to build TS without NS....

Well it's so.... Just thinking out loud... My heart bleeds when I look at your nonsense... Even the lunch of silence is broken.....

 

If we are comparing one lake to another, is the metric so important to us? No, of course, if we are comparing a lake not to a lake, then the answer can be different - both pond and puddle, but should we be afraid of getting our feet wet in a puddle going to the lake? Personally I do not see the point in the exact categories, maybe it is important for that NS, which is able to analyze information along and across, but I do not have such, and for a tree it is more than enough, as I see now.

 
Aleksey Vyazmikin:

If we are comparing one lake to another, is the metric so important to us? No, of course, if we are comparing a lake not to a lake, then the answer can be different - both pond and puddle, but should we be afraid of getting our feet wet in a puddle going to the lake? Personally I do not see the point in the exact categories, maybe it is important for that NS, which is able to analyze information along and across, but I do not have such, and for a tree it is more than enough, as I see now.

To use categorical inputs. The quality of such inputs must be very good. If the quality of inputs is poor, it is better not to convert them to categories, but to feed the real values of indicators themselves. So NS will have more options to divide this area adequately IMHO!

 

Okay, I want to give a special thanks to FOCUSNIKU!!!!!

I didn't think it would come to this, but your advice really turned out to be key in preparing the predictors. So kudos to you bitch. you son of a... !!!! (no offense)

I will definitely post a video later where I will definitely mention you in it... So wait for the video from Michael :-) where I will tell about his understanding of the field of Mo in general. I think this video will be interesting not only for beginners but also for veterans.... so... wait for!!!!

 
Mihail Marchukajtes:

To use categorical inputs. The quality of such inputs should be very good. If the quality of inputs is weak, it is better not to translate into categories, and feed the real values of the indicators themselves. This way the NS will have more options to divide this area adequately IMHO!!!

In what is proposed to measure the quality?

 
Mihail Marchukajtes:

To use categorical inputs. The quality of such inputs should be very good. If the quality of inputs is weak, it is better not to translate into categories, and feed the real values of the indicators themselves. This way the NS will have more options to divide this area adequately IMHO!

31 category... no it's more of a discretization with 31 steps. In one of Vladimir's articles it is used and the result is not worse.
 
Aleksey Vyazmikin:

How do you propose to measure quality?

First, record the moment when the decision is made. Let it be an event. Then exactly at this moment when the event occurred save the values of indicators.

To be honest I do not quite understand your table. What's in it?

Reason: