Machine learning in trading: theory, models, practice and algo-trading - page 1704

 
Mihail Marchukajtes:

A mistake is not just a mistake. A small error can have a big impact.

And NS is not asked to capture repetitive data. It is being asked to identify hidden patterns to get the right result in the absence of repetitive data. Understanding Generalization. When we have a finite domain of data, but only have 50% of that data. The network learns and identifying a pattern can build the rest of the data that it hasn't seen. It's like reconstructing old video footage with missing pixels, which the network reconstructs on its own.

So this is a statistical approach! For example, I have 20 photos of different triangles. On the 21st one, the triangle is missing the hypotenuse. I can easily determine that this is an undrawn triangle, based on the statistics collected - (each photo has a triangle). That is, an error in the drawing of the triangle is not critical, because I already have statistics in general. Doesn't this principle work with NS?
 
Tag Konow:
Well, that's a statistical approach! For example, I have 20 pictures of different triangles. On the 21st one, the triangle is missing the hypotenuse. I can easily determine that this is an undrawn triangle based on the statistics I've gathered - every photo has a triangle in it. That is, an error in the drawing of the triangle is not critical for me, because I already have statistics in general. Doesn't this principle work with NS?
Well, approximately, if you don't count that in 21 photos you have just a corner drawn and the triangle is out of the question there...
 
Mihail Marchukajtes:
Well approximately, if you don't count that in 21 pictures you have just a corner drawn and the triangle is out of the question there...
That's how statistics-based prediction works. I predict that it is an undrawn triangle, which will become hollow in the next photo.
 
I'll tell you this. Everything comes with experience, and in MO there are two types of experience, theoretical and practical, and they are very different, believe me. The more one practices, the more one begins to understand the philosophy of the field. That's it, went to get potatoes in the framework of self-isolation looks like a quest :-)(
 

To understand the essence and theoretical basis of neural networks it is necessary to know Kolmogorov, Arnold and Hecht-Nielson theory.

This knowledge is not particularly necessary for practice, but a general understanding of it would not hurt.

 
elibrarius:

I gave you a link to view splits from JOT data. That's where the full model is unloaded to a file. Then the splits are read from it.

Are you sure you can unload asymmetric trees already?


elibrarius:

In boosting, by definition, all trees are important. Each successive tree refines all the previous ones. If you throw out one tree in the middle, all the trees after that will work with wrong data - they have to be retrained, without taking into account the thrown out tree. The first tree will very closely replicate the discarded one.

No, that's not exactly true. When the model is ready there's a bunch of tree leaves sitting in it, giving a probability near 0.5 - which is garbage in essence.


elibrarius:

Yes. The individual leaves in the boosting are incomplete because they are augmented by the responses of leaves from other refining trees. And only the aggregate of the answers of, say, 100 trees gives the correct answer.

Trying to get something reliable from one sheet of the boosting model is impossible.
In boosting, all 100 responses from 100 trees are summed up, each giving for example 0.01 in total = 1. The value of 1 leaf = 0.01 - what do you want to get from it? There is nothing in it. Only the sum of 100 leaves will give the correct answer.
In fact, there the 1st tree is strong and gives for example 0.7, the rest bring the sum closer to 1. If only the leaves of the first tree are considered separately, but I think they are weaker than any tree from the random forest, due to less depth.
In a random forest the average is found, e.g. every leaf of 100 trees = 1, the average also = 1. In it the foxes are complete, but with random deviations. But a crowd of 100 answers, gives the average as a fairly accurate answer.

It is possible to get useful from a single tree, but it is rare.

Even the first tree will not be the best.

Boosting is fast, but you have to carefully select models, and that requires a large sample, which is a problem.

 
Aleksey Nikolayev:

To understand the essence and theoretical basis of neural networks it is necessary to know Kolmogorov, Arnold and Hecht-Nielson theory.

For practice, this knowledge is not particularly necessary, but a general understanding of it would not hurt.

Spc.

Lack of rigorous theory related to the mentioned above models of neural networks does not prevent from researching possibilities of their application to practical tasks.

 
Aleksey Vyazmikin:

Are you sure you can unload asymmetric trees already?

Tried an example by adding grow_policy='Lossguide'.
Here's a piece of the model containing the splits:

'split': {'border': 16.049999237060547, 'float_feature_index': 1, 'split_index': 0, 'split_type': 'FloatFeature'}}, 'right': {'left': {'value': 0.05454545454545454, 'weight': 153}, 'right': {'value': 0.8265895953757225, 'weight': 161}, 'split': {'border': 5.999999046325684, 'ctr_target_border_idx': 0, 'split_index': 4, 'split_type': 'OnlineCtr'}}, 
'split': {'cat_feature_index': 1, 'split_index': 1, 'split_type': 'OneHotFeature', 'value': -845129958}}, {'left': {'left': {'value': -0.43103007753084105, 'weight': 444}, 'right': {'value': -0.10906568919445614, 'weight': 133}, 'split': {'border': 6.999999046325684, 'ctr_target_border_idx': 0, 'split_index': 2, 'split_type': 'OnlineCtr'}}, 'right': {'left': {'value': 0.02835585997337218, 'weight': 163}, 'right': {'value': 0.5713236394502054, 'weight': 151},
'split': {'border': 5.999999046325684, 'ctr_target_border_idx': 0, 'split_index': 3, 'split_type': 'OnlineCtr'}}, 
'split': {'cat_feature_index': 1, 'split_index': 1, 'split_type': 'OneHotFeature', 'value': -845129958}

There are splits with Depthwise, too. But this is in Python, I haven't seen a way to unload a model in R. But I think you can save the model in R in an internal format, open it in Python and unload it to JSON. And from it all you need to use

 
Colleagues. The topic is interesting, but I do not understand the difference between machine learning and simple parameter optimization in the tester? After all, the goal will be approximately the same - making a decision about entering (leaving) the market when the entry (exit) parameters coincide. Both methods stop working when the market strongly changes. It is necessary to train and optimize anew. In brief and without philosophy.
 
Yan Barmin:
Colleagues. This topic is interesting, but I do not understand the difference between machine learning and simple optimization of parameters in the tester. The purpose will be approximately the same - making decision about market entry (exit) when input (exit) parameters coincide. Both methods stop working when the market strongly changes. It is necessary to train and optimize anew. In brief and without philosophy.

The difference is in the flexibility of the tool. The neural network can adapt to any series. Look at the polynomial below. The digit capacity of coefficients as well as their number can be absolutely different.

double getBinaryClassificator1(double v0, double v1, double v2, double v3, double v4, double v5, double v6, double v7, double v8, double v9, double v10) {
   double x0 = 2.0 * (v0 + 2.6721302849319) / 5.70376880500565 - 1.0;
   double x1 = 2.0 * (v1 + 0.862195874260524) / 4.318953971518134 - 1.0;
   double x2 = 2.0 * (v2 + 0.636958350251177) / 2.893126110958697 - 1.0;
   double x3 = 2.0 * (v3 + 1.28131145039971) / 4.47439455086403 - 1.0;
   double x4 = 2.0 * (v4 + 1.40655622673661) / 3.84454848827483 - 1.0;
   double x5 = 2.0 * (v5 + 1.05792133319783) / 4.0361119526354905 - 1.0;
   double x6 = 2.0 * (v6 + 0.960632890559664) / 2.810809591513934 - 1.0;
   double x7 = 2.0 * (v7 + 2.50474545671368) / 4.50657217846072 - 1.0;
   double x8 = 2.0 * (v8 + 3.37124943164126) / 5.00153555828254 - 1.0;
   double x9 = 2.0 * (v9 + 1.01434366581359) / 3.81959911946484 - 1.0;
   double x10 = 2.0 * (v10 + 0.997401251919643) / 2.959840023725593 - 1.0;
   double decision = 0.0455519244734931 * sigmoid(x0 + x5 + x8)
  + 0.01733841684822077 * sigmoid(x5 + x7 + x8)
  + 0.21269063180827888 * sigmoid(x0 + x5 + x7 + x8)
  + 0.02875816993464052 * sigmoid(x0 + x8 + x9)
  -0.025853304284676835 * sigmoid(x0 + x4 + x8 + x9)
  + 0.021169208424110384 * sigmoid(x0 + x7 + x10)
  + 0.07184095860566449 * sigmoid(x0 + x8 + x10)
  + 0.03769063180827887 * sigmoid(1.0 + x0 + x3 + x5 + x8)
  -0.03179012345679012 * sigmoid(1.0 + x3 + x6 + x9)
  + 0.02750544662309368 * sigmoid(1.0 + x0 + x5 + x7 + x9)
  + 0.1463507625272331 * sigmoid(1.0 + x1 + x2 + x8 + x9)
  + 0.012799564270152506 * sigmoid(1.0 + x0 + x2 + x10)
  + 0.1864560639070443 * sigmoid(1.0 + x0 + x1 + x5 + x8 + x10)
  + 0.07494553376906318 * sigmoid(1.0 + x0 + x2 + x5 + x8 + x10)
  + 0.014669571532316631 * sigmoid(1.0 + x2 + x5 + x6 + x8 + x10)
  + 0.05266884531590414 * sigmoid(1.0 + x0 + x1 + x7 + x8 + x10)
  + 0.04566085693536674 * sigmoid(1.0 + x0 + x1 + x2 + x8 + x9 + x10)
  + 0.061546840958605666 * sigmoid(1.0 + x0 + x2 + x4 + x8 + x9 + x10);
   return decision;
}

Now imagine the number of variants. No multi-parameter Expert Advisor can boast of such flexibility. They are also said to have a generalizing capability, but that's not exactly true :-)

Reason: