Machine learning in trading: theory, models, practice and algo-trading - page 1296

 
Dimitri:

At the beginning of the Renaissance, Byzantium no longer existed, Constantinople was the capital of the Ottoman Empire, and the Crusades had ended 200 years before that

Don't joke like that...

The revival is divided into four stages:

  1. Proto-Renaissance (2nd half of the 13th century to the 14th century)
  2. Early Renaissance (early 15th to late 15th century)
  3. High Renaissance (end of XV - first 20 years of XVI century)
  4. Late Renaissance (mid-16th to 1590s)

The Fourth Crusade of 1202-1204 (the beginning of the 13th century).

After looting the richest and largest city in Europe... They (crusaders) created a state with Constantinople as its capital, the Latin Empire. More than 50 years was a struggle against the conquerors. In 1261 the Latin Empire fell. Byzantium was restored, but it could never attain its former power again.

Over 50 years of plundering and living in Byzantium, the newly enriched Europeans (mostly Venice, which did all the shipping and its counties) developed a taste for the beautiful life, and they started paying creative people well. The creative ones hired/trained apprentices who then excelled as teachers, etc. And so it went, little by little.

 
We need an opinion on sampling within CatBoost, for training there need two samples. On the first one the optimization function works, and on the second one the model is selected, i.e. when this function should stop its work in order to avoid overtraining, and for this purpose the results of training are applied to the second sample. I'm thinking, in fact it turns out that we are looking for a pattern say on the training sample from 2014-2016, then testing this pattern on 2016-2017, and testing the model independently on the third sample already on 2018-2019. I am confused by such a large time spread, or rather even doubt that we need a large sample for the test, because we want to catch a steady pattern over a long period of time, but the duration of this pattern we do not know ... I am thinking that it is enough to put 2-3 months in the test sample, which will reveal any trend that is cyclic and will be repeated earlier and later, but then there is a risk that before revealing this cyclic pattern in the training sample the model will clump too many trees describing something else, and only then will build trees describing the trend in the test sample. Anyway in doubt, I can't figure out how to do an experiment that will help determine how long each of the three samples should be. Does anyone have any ideas about this?
 
elibrarius:

Over 50 years of plundering and living in Byzantium, the newly enriched Europeans (mostly Venice, which did all the trucking and its districts) developed a taste for the beautiful life and began to pay creative people well. The creative ones hired/trained apprentices who then excelled as teachers, etc. And so it went along, little by little.

But this is, of course, IMHO.

A European program said that the fall of Constantinople was a blessing in spite of the hundreds of thousands of victims and destruction, Europe received a great influx of educated people, who were eagerly hired and this gave them the opportunity to return some of the lost knowledge from the Roman Empire, which contributed to the withdrawal from the Middle Ages.

That is to say, even such blasphemous acts as war, even now, are presented as good deeds for all mankind... History is written by the victors.

 
Aleksey Vyazmikin:
but then there is a risk that before revealing this cyclicality on the training sample, the model will make too many trees describing something else, and only then will build trees describing the trend on the test sample.

Neither trees nor NS separate the strings by time, they even shuffle them. Therefore, no trees are "then" built. They are all built on evenly shuffled data. Rows from 2014 and 2016 can stand side by side.

If you don't shuffle the rows for the NS, it will just retrain on the first examples and deadlock and the last data won't finish learning it. After shuffling the rows, NS learns evenly. You don't need to shuffle rows if r=1 (amount of rows to learn one tree), but usually you put it <1 to avoid re-learning, so you need to shuffle it too, so at r=0.5 you don't take only 2014 and 2015 data.

Aleksey Vyazmikin:
I can't figure out how to conduct an experiment that will help determine how long each of the three samples should be. Do you have any ideas about it?

I think we need to optimize this too. But I think that the number of lines should not be less than 1000-10000, so that the sample is representative and averaged all random deviations. Otherwise you can adjust for random bias in a small sample.

 
Aleksey Vyazmikin:

A European program said that the fall of Constantinople was a good thing, in spite of the hundreds of thousands of victims and destruction, Europe was flooded with educated immigrants who were willingly hired and thus recovered some of the lost knowledge from the Roman Empire, which contributed to the emergence from the Middle Ages.

That is to say, even such blasphemous acts as war, even now, are presented as good deeds for all mankind... History is written by the victors.

The good of course everyone perceives for himself. For Europeans, of course, it is a good thing to plunder finances and brains. For the Byzantines it is not a boon, for many it was death.

I don't remember exactly, but in the period of the golden age the annual taxes of Byzantium were about 2-4 thousand tons of gold. Even for our time it is a very good amount for many countries. But I can be wrong about the figures - a few years ago I watched a movie, it said about it. If you are interested, take a look. Accidentally came across at the beginning of the film - only coins were taken out several hundred tons.


 
elibrarius:

Neither trees nor NS separate strings by time, but even shuffle them. Therefore, no trees "afterwards" are built. They are all built on evenly shuffled data. Rows from 2014 and 2016 can stand side by side.

It's not quite possible that I've made my point.

Look, we built one tree on the training sample, it covered 10% of the sample (Recall) and so let's say 20 trees, each adds there 3%-7% to Recall, but we have that on the training sample, while on the test sample, maybe only 5 and 6 trees will even give any response on completeness and accuracy, and trees before and after will then be noise, but if those "after" will be cut by algorithm, then those "before" will remain. Thus, we will get a model that has got trees that help classification and those that hinder it or just behave passively. That is why the question arises exactly in the size of the test sample and its filling.

In total I have about 14 strings, they should be divided into 3 samples.

Perhaps for these types of models it is efficient to cut different chunks of sample and then test the resulting models on the whole sample for stability... I'm pondering.

 
elibrarius:

The good of course everyone perceives for himself. For Europeans, of course, it was a boon to plunder finances and brains. For the Byzantines it is not a boon, for many it was death.

I don't remember exactly, but in the heyday of Byzantium the annual taxes were about 2-4 thousand tons of gold. Even for our time it is a very good amount for many countries. But I can be wrong about the figures - a few years ago I watched a movie, it said so. If you are interested, take a look. Accidentally came across at the beginning of the film - only coins were taken out several hundreds of tons.


I'll look at the video, thank you, but I'd rather see from the Japanese or someone else independent...

 
Aleksey Vyazmikin:
maybe only tree 5 and 6 will give any response at all on completeness and accuracy, and trees before and after will then be noise, but if those "after" will be pruned by algorithm, those "before" will stay.

What algorithm will cut trees from an already built forest? The forest will finish growing either when it reaches a certain number or by some other method, when it thinks it has learned well. If you will be pruning when learning, then it will have a positive effect on the error on train (and if you have it on valid).

Well, in general, of course, part of the trees will vote for, part against. It's impossible to get rid of it, because that's what allows forests to learn well, unlike individual trees, due to averaging of votes. When boosting, only the first tree learns from data, all the rest from errors.

 
elibrarius:

What algorithm will cut trees from an already built forest? The forest will finish growing either when it reaches a certain number or by some other method, when it thinks it has learned well. If there will be pruning during training, it will have a positive effect on the error on train (and if there is on valid).

What algorithm will prune - so CatBoost does it in training, you can set there a parameter, that if 10 (as much as you specify) new trees have not improved the result, then the model without these last 10 trees is taken, and therefore it is the best of what is there.

elibrarius:

When boosting, only the first tree learns from the data, all the others from errors.

Interesting saying. However, the subsequent trees are built to reduce the error from the existing tree composition, but I don't understand why they don't use sampling then, tell me in more detail, maybe I don't understand something in depth...

 
Aleksey Vyazmikin:

Interesting saying. However, the subsequent trees are built in order to reduce the error from the existing composition of trees, but I do not understand why they do not use sampling then, tell me in detail, maybe I do not understand something deep...

Yes, in order to reduce the error, they take exactly the errors as the target, then subtract it.

Here's the algorithm of boosting, I'm studying it myself https://neurohive.io/ru/osnovy-data-science/gradientyj-busting/.


1. Set up a linear regression or decision tree on the data (the decision tree in the code is chosen here) [call x as input and y as output] (1-tree is trained on the data)

2. Calculate error errors. Actual target value, minus predicted target value [e1 = y - y_predicted1]

3. Set up a new model for overfitting as the target variable with the same input variables [name it e1_predicted] (2 and the other trees are trained on errors)

4. Add the predicted outliers to the previous predictions
[y_predicted2 = y_predicted1 + e1_predicted]

5. Set another model of the remaining outliers. i.e., [e2 = y - y_predicted2], and repeat steps 2 through 5 until they begin overfitting, or the sum becomes constant. The overfitting control can be controlled by constantly checking the accuracy on the data for validation.


I understand this is classic boosting. Maybe catbust has invented something of its own...

Градиентый бустинг — просто о сложном
Градиентый бустинг — просто о сложном
  • 2018.11.27
  • neurohive.io
Хотя большинство победителей соревнований на Kaggle используют композицию разных моделей, одна из них заслуживает особого внимания, так как является почти обязательной частью. Речь, конечно, про Градиентный бустинг (GBM) и его вариации. Возьмем, например. победителя Safe Driver Prediction, Michael Jahrer. Его решение — это комбинация шести...
Reason: