Machine learning in trading: theory, models, practice and algo-trading - page 385

 
Maxim Dmitrievsky:


This is what they give tasks? I understand it is the hedge fund, if you sign up, what will it give?

I will run with different models, so far the same as you got 0.5

I have 0.5 and 0.513 different, of course not so much for trading, but nevertheless. They call themselves hedge-funds, what is their legal structure I don't know, in America it seems that there is no such type of organization formally, but I may be wrong. If you register you can take part in predicting the market a week in advance, on such datasets, someone manages to raise $ > 10k on it, but I personally know those who have only earned a couple of hundred bucks)))
 
Aliosha:
Well, 0.5 and 0.513 is different, of course not much as for trading, but still. They call themselves hedge-funds, what is their legal structure I do not know, in America there is no such type of organization formally, but I could be wrong. If you register you can take part in predicting the market a week in advance, on such datasets, someone manages to raise $ > 10k on it, but personally I know those who have only earned a couple of hundred bucks)))


i.e. how does it happen? they throw the dataset, i have to train the network on it and then what? i think it has a trick, you need to do feature selection )

https://docs.microsoft.com/ru-ru/azure/machine-learning/machine-learning-data-science-select-features

Выбор признаков в процессе обработки и анализа данных группы
Выбор признаков в процессе обработки и анализа данных группы
  • 2017.03.24
  • bradsev
  • docs.microsoft.com
В этой статье описаны цели выбора характеристик и приводятся примеры, поясняющие его роль в совершенствовании данных в процессе машинного обучения. Эти примеры взяты из Студии машинного обучения Azure. Реконструирование признаков: этот процесс направлен на создание дополнительных признаков на основе соответствующих существующих необработанных...
 
Maxim Dmitrievsky:


i.e. how does it happen? they throw a dataset, i have to train the network on it and then what? i think it has a trick, i need to make a feature selection)

https://docs.microsoft.com/ru-ru/azure/machine-learning/machine-learning-data-science-select-features

Get registered and read the rules, it is half a page. You just download the set, learn, run it on a test and send it to them, there is an example how the result should look like, so the id's and column names should be like theirs.
 
Alyosha:
Sign up read rolls there half a page. Just download the set, learn, run on the test and send them, there is an example of how the result should look like, so that the id's and column names as they have.

Yes, I'll try it later... in short this dataset is hopeless, there is no pattern)
 

The numerai rules have changed a couple of times this year.

It used to be nice and simple to train a model on a train table, check the error on the test table, send them predictions, they extrapolate them to their hidden test table, count the error on it. Whoever has less error on the hidden table wins. It was very good and correct that the error on the test dataset really coincided with the one on their hidden dataset, you could check your model.

Then they changed something, and error on test dataset ceased to correlate with the error on their hidden check dataset. All leaders from the top disappeared; the winners are just random people who are lucky to get their model on their hidden check table. Imho feile by numerai, some random crap, not a contest.

Then they saw that all the adequate people got away from their random contest, realized their mistake, and changed something again. Now the predictions are evaluated according to several criteria. The criterion that pisses me off the most is "uniqueness", if someone has sent similar results before then yours will be rejected as plagiarism. I.e. if several people use the same framework to create a model, the one who woke up early and sent the forecast will take the money.
Model accuracy is now completely useless in calculating profits. You can get an error 0, be in 1st place in the top, and earn nothing because the top shows the result on the test data they give themselves to download, the top no longer shows the result of their hidden test table.
The current iteration of their contest is imho nonsense, no transparency, everything is confusing. I'm waiting for them to change something in the contest again, hopefully it will be adequate again.

 
Maxim Dmitrievsky:

Yes, I'll try it later... in short, this dataset is hopeless, there is no pattern.)
Try this table. Train the model only on those lines, where data_type=="validation". This is the data used to evaluate the model and get into the top. If you achieve 100% accuracy, you'll be in first place in the top. But you won't get a cash prize for such a cheat.
Files:
 
Dr. Trader:
Try this table. Train the model only on those lines where data_type=="validation". This is the data used to evaluate the model and get into the top. If you achieve 100% accuracy, you'll be in first place in the top. Except that the cash prize for such a cheat will not be given.

Oh, cool, I'll try it tomorrow... great for training)
 
Dr. Trader:
Try this table. Train the model only on those lines where data_type=="validation". This is the data used to evaluate the model and get into the top. If you achieve 100% accuracy, you'll be in first place in the top. But you won't get a money prize for such a cheat.


again 0.5



 

It is important to understand how the results coincide on the training and evaluation datasets. I see a split dataset there, by logic (maybe I'm wrong) there data is randomly divided into two groups, the first group is trained model, the second group only predicts and evaluates the model.

What will be the result if it predicts the same data on which it was trained?
And then predict the data on which there was no training, and compare the accuracy of the model in both cases.

If on the trained data it predicts with 100% accuracy, and on the estimated - 50% - then the model is too overtrained, this is bad.

 
Dr. Trader:

It is important to understand how the results coincide on the training and evaluation datasets. I see a split dataset there, by logic (maybe I'm wrong) there data is randomly divided into two groups, the first group is trained model, the second group only predicts and evaluates the model.

What will be the result of predicting the same data which was used for training?
And then predict the data that wasn't trained on, and compare the accuracy of the model in both cases.

If on the trained data it predicts with 100% accuracy, and on the estimated - 50% - then the model is over-trained, this is bad.


I've got 50% prediction on training ones too. I removed the split date and submitted the same set as a test one.

Well firstly there is a very large set, and secondly the nature of the chips is not known at all, and linear models like vectors and Forest are clearly not suitable here, you need to make a complex non-grid, maybe that's the reason. I haven't figure out yet how to modify neural network in this studio to make it more complex, for example convolution to try to make

From this one: https://gallery.cortanaintelligence.com/Experiment/Neural-Network-Convolution-and-pooling-deep-net-2

I am still new to grids... )

Reason: