Machine learning in trading: theory, models, practice and algo-trading - page 819

 
Maxim Dmitrievsky:

1: decide on the sample size, your sample is very small

2. Study simple classification/regression models (linear), most likely they will suit you, and if the error is large, you can try to switch to more complex (non-linear) There are in the library alglib in the terminal (decision trees and the forest are also there)

3) Never give any advice, especially about using R :))) It's time to banish them from this forum.

4. If the problem is solvable without machine learning, it is better not to use it

1. Yes this is just an example, the sample is large in reality of course.

2. Thank you, of course it is better to start with a simple one - I think this is a new stage in my development in the matter of data mining.

3. So it would be good to know about alternatives in MQL...

4. I am still looking for clustering features (patterns) by their influence on trade. I am afraid I have started to develop my TS in the direction of fitting, and then I want to accumulate ideas in the form of signs of market behavior in order to reject my delusions.

 
Aleksey Vyazmikin:

1. Yes this is just an example, the sample is large in reality, of course.

2. Thanks, of course it's better to start with simple - I think this is a new stage in my development in the matter of data mining.

3. it would be good to know about alternatives in MQL...

4. I am still looking for clustering features (patterns) by their influence on trade. For I'm afraid I've started to develop my TS in the direction of fitting, and then I want to accumulate ideas in the form of signs of market behavior in order to discard my hallucinations.

http://alglib.sources.ru/dataanalysis/

all this is available in mql (the library comes standard with the terminal)

for clustering you can use a simple k-means

And for your table you can try logistic regression (there are a lot of videos on YouTube about it and how to use it) (multiple logit regression), it is just the basis for splitting into classes, like in your case 0 or 1. Next comes the multilayer perseptron, which also divides into classes, but in a more complex (non-linear) way.

an ordinary decision tree is unlikely to work, it is better to use a forest, they consist of a set of such trees, which are partitioned differently (for example, as the 1st partition will be used not the 1st variable, but the 3rd), then the results of all trees are averaged and get a more accurate and stable estimate. But if the problem turns out to be essentially linear, then the scaffolding is not suitable, it's better to use logistic regression or perseptron with a hidden layer. That's why it's recommended to start with the simplest linear models, and if the result is satisfactory, then don't bother.

 
Aleksey Vyazmikin:

if I want to look at the effect of days of the week, should I make different input parameters marking the day, or is it enough to mark one parameter from 1 to 5?)

Forest creates a rule using "more" or "less" operations to compare values.

In case of values 1,2,3,4,5 - if, for example, you want to create a rule that works only on Wednesday, it will take you two branches - "less than Thursday" and "more than Tuesday".
If they are different parameters with labels, then one comparison is enough (labels greater than zero).
The fewer branches it takes to create a rule, the simpler the model, and the better.

Do both, in general, together. One column with values 1,2,3,4,5. And another 5 columns with labels.

 
Aleksey Vyazmikin:

Has anyone ever done a comparison of different algorithms for efficiency, well, if the answer is known, like in my example, but for more complex problems?

Predicting data like yours is called classification, when the right answer may be only a pair of values, or even not a number but a concept ("exit trade", "roll over" etc.).
Neuronics and boosting are good at it, you can train them on such data and then use them for prediction on new data.


I understand that your goal is to extract the most valuable information from your data, and get a readable set of rules. Then neuronics won't work, extracting rules and knowledge from neuronics is not that easy.

A forest gives many choices, many trees (formulas), and the final answer is determined by voting, where each formula gives a different answer, and the most popular one will be chosen in the end. But such a clutter of rules is too difficult to interpret, there will be a lot of pictures as I added above, each will give a different answer, and the result is what you get most often.

A single tree will give a picture like the one above, in complicated cases with tens/hundreds of branches in the graph. But all this can be easily interpreted and repeated by following the branches in the picture.

There are a lot of models, choose what suits you best.


Alglib in MQL can also do that. But it's inconvenient, every time you have the slightest change you have to compile a script, run it, wait for the result.

R or Python in the case of errors allow you to simply rerun the previous line of code, changing it. All objects created during the script remain in memory and you can continue to work with them, predict and run new lines of code. No need to re-launch the entire script after the slightest change, as in mql.

 
While there is a lull, I will post some text here, maybe someone will be interested.
 
Yuriy Asaulenko:

And forecasting with 70% confidence in the interval does not give much. It's not very hard to do, but it's still useless.

70% confidence that 50% accuracy really doesn't give much, and 70% accuracy is a fairy tale or a mistake of those who use mixed targets, at 70% accuracy SharpRatio >30, this is fantastic even for ultra HFT

 
SanSanych Fomenko:

For the hundredth time:

1. datamaning is mandatory. it is mandatory to start with selecting only those predictors that IMPACT on the target variable. And then all the datamining.

2. There are two models:

Training of models with cross validation if possible.

Evaluation of models outside the training file

5. Test run in the tester.


And for the hundredth time: ALL Stages are MUST!


Having done all of these, you can make the assumption that the depo will not sell out at once!


Let's go, men! Finish hanging out on the forum and with quiet joy to implement the outlined plan for R.


Three cheers!

I'm just kidding, I'm just like you, I'm trying to tell people to use ZZ as a Goal, I was naive and did not understand your evil plans :)

 
Alyosha:

It's okay, I'm just kidding, I'm agitating people to use ZZ as a goal, I was just naive and did not understand your insidious plans :)

Again I have to clarify: I'm not agitating for ZZ - it's just very clear for trend trading systems.

And the target and predictors to the target are all extremely complicated and very costly. And the model is pretty easy to pick up. It happens that some types are categorically unsuitable for the target and its predictors, and another type is suitable. in general, you should always try a dozen or two models.

 
Alyosha:

70% confidence that 50% accuracy really won't do much, and 70% accuracy is a fairy tale or a mistake of those who use mixed targets, at 70% accuracy SharpRatio >30, this is fantastic even for ultra HFT

Once again, for those who don't understand. 70% is reality. At 70% of time interval we can quite well achieve a justifiable forecast.

The question of uselessness of such forecast is different. Out of these 70% justifiable forecasts, only about a quarter or less is realistically suitable for entering a trade, i.e. only ~17% of the interval. However, given that it is a priori unknown where the forecast is justified, and the remaining 30% gives us a significant proportion of both failed trades and missed "correct" trades, it is not possible to implement 70% of reliable prediction.

 
Alyosha:

It's okay, I'm just kidding, I'm just like you agitate people to use ZZ as a goal, I was just naive and did not understand your evil plans :)

The insidious, viral plan "ZZ-01", was developed several years ago in one
in a secret laboratory. Fa only acted as its carrier. Eh, Alyosha...

Reason: