The analysis discusses model performance, regression vs classification, and GARCH models for financial data.

Dr. Trader 2017.06.06 06:03 #3851

Okay, so this 0.8% is honestly obtained. Apparently there is a built-in algorithm in the model to protect against overtraining.

Maxim Dmitrievsky:

Well first of all there is a very large set, and secondly the nature of the chips is not known at all, and linear models like vectors and Forest are clearly not suitable here, you need to make a complex non-grid, maybe this is the reason.

Classification models don't really fit, yes. You need regression. This is because the result is assessed not by the accuracy of the model, but through the Logloss function, which usually gives a higher estimate to regression results

MultiLogLoss <- function(act, pred) {
  eps <- 1 e-15
  pred <- pmin(pmax(pred, eps), 1 - eps)
  sum(act * log(pred) + (1 - act) * log(1 - pred)) * -1 / length(act)
}

act (actual) - expected result, vector
pred (predicted) - predicted result, vector

The smaller is the score of this function, the better. If the function result is greater or equal to 0.6931472, the model is bad and this result means a random guess.

Well, judging by the results of the 54th round, the winner got 0.690467 when estimating on the new, hidden for the participants data, it is necessary to strive for such a result.

Econometrics: one step ahead Discussion of article "Random Econometrics: why co-integration is

[Deleted] 2017.06.06 06:39 #3852

Dr. Trader:

Okay, so this 0.8% is honestly obtained. Apparently there is a built-in algorithm in the model to protect against overtraining.

Classification models don't really fit, yes. Regression is needed. Because the result is evaluated not by the accuracy of the model, but through the function Logloss, which regression results usually give a higher estimate

act (actual) - expected result, vector
pred (predicted) - predicted result, vector

The lower the score of this function, the better. If the result of the function is greater or equal to 0.6931472, the model is bad and this result means a random guess.

Judging by the results of the 54th round, the winner of the contest got 0.690467 when estimating on the new, hidden from the participants data, one should strive for such a result.

Regression NS produces this on the training sample, it's also a test sample, I'm not sure how to interpret it correctly, but I think it's bad too ) I.e. standard simple neural network gives no advantage in regression over classification, and also gives no advantage over other methods of classification. And there are normalized inputs and outputs, regression makes no sense to me in this case...

Initial deposit size - How to form the How do you practically

СанСаныч Фоменко 2017.06.06 08:45 #3853

Selecting literature on a competing topic

5087 documents matched the search for GARCH, GJR-GARCH, EGARCH, in titles and keywords.

GARCH models are supposed to be clever, everything is modeled transparently:

1. the original series as hopeless is converted into increments as log(Xi/ Xi-1)

2. The average is modeled with the ARIMA model

3. nuances in dispersion in the sense of skewness and kurtosis (thick tails), etc. are modeled.

4. The distribution itself is modeled. Usually one takes either a t-distribution with a skewed distribution or a GED distribution with a skewed distribution.

When considering trading on exchanges, they introduce models with switching modes, or they take into account changes of model parameters, or the spread.

In the articles you can often find ready-made code in R.

From theory to practice Distribution of price increments Can MQL programmers be

[Deleted] 2017.06.06 10:33 #3854

SanSanych Fomenko:

Selecting literature on a competing topic

5087 documents matched the search for GARCH, GJR-GARCH, EGARCH, in titles and keywords.

GARCH models are supposed to be clever, everything is modeled transparently:

1. the original series as hopeless is converted into increments as log(Xi/ Xi-1)

2. The average ARIMA model is modeled

3. nuances in dispersion in the sense of skewness and kurtosis (thick tails), etc. are modeled.

4. The distribution itself is modeled. Usually one takes either a t-distribution with a skewed distribution or a GED distribution with a skewed distribution.

When considering trading on exchanges, they introduce models with switching modes, or they take into account changes of model parameters, or the spread.

In the articles you can often find ready-made code in R.

What about the fact that increments do not indicate trends in any way? My model also uses increments for short-term accuracy, but I also look at trend ratios on the sly

Ah, well, you can look at the increments on different time samples. Didn't you try to train ns return-self from different TFs?

Strategic foresight systems Statistics, optimisation and "lucky Cool system!

СанСаныч Фоменко 2017.06.07 09:07 #3855

Maxim Dmitrievsky:

What about the fact that increments do not indicate trends in any way?

Yes, they don't.

Either the model predicts the increment or the direction - that's what classification models are for.

I'm not aware of any classification models that recognize movement on the news. And for GARCH, that's the point of the model - to work out the occurred movement. Fat tails - this is the movement on the news when trends break and sharp reversals occur.

Well, you can watch the increase on different timeframes.

There are interesting GARCH models for several timeframes. The meaning is the following.

Suppose we predict the increment on H1. The model requires input data characterizing the distribution. As such input data (usually volatility) we take not the previous hour but minutes within the current hour.

Do you know how Distribution of price increments Why is the normal

pantural 2017.06.08 17:35 #3856

Dr. Trader:

The numerai rules have changed a couple of times this year.

It used to be nice and simple to train a model on a train table, check the error on a test table, send them predictions, they extrapolate them to their hidden test table, count the error on it. Whoever has less error on the hidden table wins. It was very good and correct that the error on the test dataset really coincided with the one on their hidden dataset, you could check your model.

Then they changed something, and error on test dataset ceased to correlate with the error on their hidden check dataset. All leaders from the top disappeared; the winners are just random people who are lucky to get their model on their hidden check table. Imho feile by numerai, some random crap, not a contest.

Then they saw that all the adequate people got away from their random contest, realized their mistake, and changed something again. Now the predictions are evaluated according to several criteria. The criterion that pisses me off the most is "uniqueness", if someone has sent similar results before then yours will be rejected as plagiarism. I.e. if several people use the same framework to create a model, the one who woke up early and sent the forecast will take the money.
Model accuracy is now completely useless in calculating profits. You can get an error of 0, be in 1st place in the top, and earn nothing because the top shows the result on the test data, which they themselves give a download, the top no longer shows the result of their hidden test table.
The current iteration of their contest is imho nonsense, no transparency, everything is confusing. Waiting for them to change something in the contest again, hopefully it will be adequate again.

How much real money did you make from this site before they changed the rules?

Dr. Trader 2017.06.08 19:19 #3857

[Deleted] 2017.06.08 19:51 #3858

Dr. Trader:

More like some kind of rebate service)) Not like paying a data scientist

Dr. Trader 2017.06.08 21:19 #3859

Each week, the winners of the top 100 are paid a total of $3,600, but the prizes decrease in volume very sharply. First place gets $1,000, then $435, then $257, etc. Even if you get into tenth place (there are usually over 500 participants), you get a measly $63. You've got to be kidding me.

I see this contest more as a way to compare my model with the leaders and learn different approaches to datamining, rather than a way to make money.

What do you think Airmike-Trading Proposal to the organisers

Dr. Trader 2017.06.08 21:36 #3860

I wanted to know how the score from the leaderboard (val logloss, vertical) is related to the score the model got on the new data (live logloss, horizontal). (55 round)

Well done only those in the bottom left rectangle. The rest, even though they made it to the leaderboard, but all of them failed on the new data. The best on the leaderboard (the bottom two points on the right) showed the worst result on the new data.

The one with the leftmost point on the graph wins, and it looks like a random outlier rather than purposeful machine learning.

It is interesting that at logloss 0.690 - 0.691 on validation data almost all showed a good result on the new data, I have no idea what this is about.

Registration for the Real Daily timeframe backtest doesn't Is there a pattern

Machine learning in trading: theory, models, practice and algo-trading - page 386