I am invited to solve the problem of informative signs - General

Alexey Burnakov 2016.05.27 15:03 #11

I will clarify the conditions for the cash prize:

5 credits will go to the first one to solve the problem

Deadline for solutions: June 30, 2016.

Alexey Burnakov 2016.05.27 16:43 #12

Readings:https://habrahabr.ru/company/mlclass/blog/247751/

Alexey Burnakov 2016.05.27 17:46 #13

There is an example of applying the algorithm for selecting informative signs to implement a trading strategy.

You probably read my blogs about the Big Experiment: https://www.mql5.com/ru/blogs/post/661895

And here is such a picture:

I tried to find one pattern for five pairs and detect the percentage of correctly guessed deals on a validation sample of about 25 years. It didn't work right off the bat. I did not reach the desired accuracy for any given forecast horizon.

Next, let's take just one pair, the eurusd. I found the dependence of the price movement 3 hours ahead on a subset of my predictors.

I have reduced the predictors to a categorical form and started my function of searching for significant predictors. I did it right now, while I was at work, in 20 minutes.

[1] "1.51%"

> final_vector <- c((sao$par >= threshold), T)

> names(sampleA)[final_vector]

[1] "lag_diff_45_var" "lag_diff_128_var" "lag_max_diff_8_var" "lag_max_diff_11_var" "lag_max_diff_724_var" "lag_sd_362_var"

[7] "output"

I did not get convergence so quickly, but I got some result at the level of one and a half percent explanatory power.

The convergence graph (minimization).

Next is the construction of the model.

We have several categorical predictors. We build a "rule book" or what kind of dependence there is between the predictors and the output - long or short in a 3-hour time horizon.

How it looks like as a result:

уровни предикторов	sell	buy	pval	concat	direction
121121	11	31	2.03E-03	121121	1
211112	3	15	4.68E-03	211112	1
222222	19	4	1.76E-03	222222	0
222311	8	0	4.68E-03	222311	0
321113	7	0	8.15E-03	321113	0
333332	53	19	6.15E-05	333332	0

We see the skewness of the number of buy and sell in each line and the corresponding value of the p-value of the chi-square criterion for the 50/50 distribution. We select only those lines where the probability is below 0.01.

And the code of the whole experiment, starting from the moment when the inputs are already selected:

dat_test <- sampleA[, c("lag_diff_45_var"

, "lag_diff_128_var"

, "lag_max_diff_8_var"

, "lag_max_diff_11_var"

, "lag_max_diff_724_var"

, "lag_sd_362_var"

, "output")]

dat_test$concat <- do.call(paste0, dat_test[1:(ncol(dat_test) - 1)])

x <- as.data.frame.matrix(table(dat_test$concat

, dat_test$output))

x$pval <- NA

for (i in 1:nrow(x)){

x$pval[i] <- chisq.test(x = c(x$`0`[i], x$`1`[i])

, p = c(0.5, 0.5))$p.value

}

trained_model <- subset(x

, x$pval < 0.01)

trained_model$concat <- rownames(trained_model)

trained_model$direction <- NA

trained_model$direction [trained_model$`1` > trained_model$`0`] <- 1

trained_model$direction [trained_model$`0` > trained_model$`1`] <- 0

### test model

load('C:/Users/aburnakov/Documents/Private/big_experiment/many_test_samples.R')

many_test_samples_eurusd_categorical <- list()

for (j in 1:49){

dat <- many_test_samples[[j]][, c(1:108, 122)]

disc_levels <- 3

for (i in 1:108){

naming <- paste(names(dat[i]), 'var', sep = "_")

dat[, eval(naming)] <- discretize(dat[, eval(names(dat[i]))], disc = "equalfreq", nbins = disc_levels)[,1]

}

dat$output <- NA

dat$output [dat$future_lag_181 > 0] <- 1

dat$output [dat$future_lag_181 < 0] <- 0

many_test_samples_eurusd_categorical[[j]] <- subset(dat

, is.na(dat$output) == F)[, 110:218]

many_test_samples_eurusd_categorical[[j]] <- many_test_samples_eurusd_categorical[[j]][(nrow(dat) / 5):(2 * nrow(dat) / 5), ]

}

correct_validation_results <- data.frame()

for (i in 1:49){

dat_valid <- many_test_samples_eurusd_categorical[[i]][, c("lag_diff_45_var"

, "lag_diff_128_var"

, "lag_max_diff_8_var"

, "lag_max_diff_11_var"

, "lag_max_diff_724_var"

, "lag_sd_362_var"

, "output")]

dat_valid$concat <- do.call(paste0, dat_valid[1:(ncol(dat_valid) - 1)])

y <- as.data.frame.matrix(table(dat_valid$concat

, dat_valid$output))

y$concat <- rownames(y)

valid_result <- merge(x = y, y = trained_model[, 4:5], by.x = 'concat', by.y = 'concat')

correct_sell <- sum(subset(valid_result

, valid_result$direction == 0)[, 2])

correct_buys <- sum(subset(valid_result

, valid_result$direction == 1)[, 3])

correct_validation_results[i, 1] <- correct_sell

correct_validation_results[i, 2] <- correct_buys

correct_validation_results[i, 3] <- sum(correct_sell

, correct_buys)

correct_validation_results[i, 4] <- sum(valid_result[, 2:3])

correct_validation_results[i, 5] <- correct_validation_results[i, 3] / correct_validation_results[i, 4]

}

hist(correct_validation_results$V5, breaks = 10)

plot(correct_validation_results$V5, type = 's')

sum(correct_validation_results$V3) / sum(correct_validation_results$V4)

Next, there are 49 validation samples, each covering 5 years or so. Let's validate the model on them and count the percentage of correctly guessed trade directions.

Let's see the percentage of correctly guessed trades by samples and the histogram of this value:

And count how much in total we guess the direction of the trade on all samples:

> sum(correct_validation_results$`total correct deals`) / sum(correct_validation_results$`total deals`)

[1] 0.5361318

About 54%. But without taking into account the fact that we need to overcome the distance between Ask & Bid. That is, the threshold value according to the above chart is about 53%, provided that the spread = 1 point.

That is, we have made up a simple model in 30 minutes that is easy to hardcode in the terminal, for example. And it's not even a committee. And I was looking for dependencies for 20 minutes instead of 20 hours. All in all, there is something.

And all thanks to the correct selection of informative signs.

And detailed statistics for each valid sample.

sample	correct sell	correct buy	total correct deals	total deals	share correct
1	37	10	47	85	0.5529412
2	26	7	33	65	0.5076923
3	30	9	39	80	0.4875
4	36	11	47	88	0.5340909
5	33	12	45	90	0.5
6	28	10	38	78	0.4871795
7	30	9	39	75	0.52
8	34	8	42	81	0.5185185
9	24	11	35	67	0.5223881
10	23	14	37	74	0.5
11	28	13	41	88	0.4659091
12	31	13	44	82	0.5365854
13	33	9	42	80	0.525
14	23	7	30	63	0.4761905
15	28	12	40	78	0.5128205
16	23	16	39	72	0.5416667
17	30	13	43	74	0.5810811
18	38	8	46	82	0.5609756
19	26	8	34	72	0.4722222
20	35	12	47	79	0.5949367
21	32	11	43	76	0.5657895
22	30	10	40	75	0.5333333
23	28	8	36	70	0.5142857
24	21	8	29	70	0.4142857
25	24	8	32	62	0.516129
26	34	15	49	83	0.5903614
27	24	9	33	63	0.5238095
28	26	14	40	66	0.6060606
29	35	6	41	84	0.4880952
30	28	8	36	74	0.4864865
31	26	14	40	79	0.5063291
32	31	15	46	88	0.5227273
33	35	14	49	93	0.5268817
34	35	19	54	85	0.6352941
35	27	8	35	64	0.546875
36	30	10	40	83	0.4819277
37	36	9	45	79	0.5696203
38	25	8	33	73	0.4520548
39	39	12	51	85	0.6
40	37	9	46	79	0.5822785
41	41	12	53	90	0.5888889
42	29	7	36	59	0.6101695
43	36	14	50	77	0.6493506
44	36	15	51	88	0.5795455
45	34	7	41	67	0.6119403
46	28	12	40	75	0.5333333
47	27	11	38	69	0.5507246
48	28	16	44	83	0.5301205
49	29	10	39	72	0.5416667

СОПРОВОЖДЕНИЕ ЭКСПЕРИМЕНТА ПО АНАЛИЗУ ДАННЫХ ФОРЕКСА: доказательство значимости предсказаний

2016.03.02
Alexey Burnakov
www.mql5.com

Начало по ссылкам: https://www.mql5.com/ru/blogs/post/659572 https://www.mql5.com/ru/blogs/post/659929 https://www.mql5.com/ru/blogs/post/660386 https://www.mql5.com/ru/blogs/post/661062

Journal of Testing - Real and Generated Ticks Optimization Types - Algorithmic

Alexey Burnakov 2016.05.27 18:33 #14

All the raw data is available at the links in the blog.

And this model is hardly very profitable. The MO is at the half-point level. But that's the direction I'm following.

[Deleted] 2016.05.28 17:50 #15

SanSanych Fomenko:

Always learning from the past.

We look at the graph for centuries. Both on and we see "three soldiers", then we see "head and shoulders". How many such figures we have already seen and believe in these figures, we trade...

And if the task is set as follows:

1. to automatically find such figures, not for all charts, but for a particular currency pair, the ones that have occurred recently, and not three centuries ago with the Japanese when trading rice.

2. is the initial data on which we automatically search for such figures - patterns.

To answer the first question let us consider an algorithm called "random forest". 10-5-100-200 ... input variables. Then it takes the entire set of values of the variables referring to one point in time corresponding to one bar and searches for such a combination of those input variables that would correspond on the historical data to a quite certain result, for example, a BUY order. And another set of combinations for another order - SELL. A separate tree corresponds to each such set. Experience shows that the algorithm finds 200-300 trees for the input set of 18000 bars (about 3 years). This is the set of patterns, almost analogues of "heads and shoulders", and whole mouths of soldiers.

The problem with this algorithm is that such trees can pick up some specifics that are not encountered in the future. This is called "superfitting" here in the forum, "overfitting" in machine learning. It is known that the whole large set of input variables can be divided into two parts: those related to the output variable and those not related to the noise. So Burnakov tries to weed out the ones that are irrelevant to the output.

PS.

When building a trend TS (BUY, SELL) any kind of variables are related to noise!

What you see is a small part of the market and it's not the most important. Nobody builds the pyramid upside down.

СанСаныч Фоменко 2016.05.28 18:31 #16

yerlan Imangeldinov:
What you see is a small part of the market and not the most important. No one is building a pyramid upside down.

And specifically, what don't I see?

Alexey Burnakov 2016.05.28 18:40 #17

yerlan Imangeldinov:
What you see is a small part of the market and not the most important. No one is building a pyramid upside down.

You can add information to the system besides the price history. But you still have to train on the history. Or - Flair.

Dr. Trader 2016.05.28 18:42 #18

I tried to train the neuron on the input data, then looked at the weights. If the input data has a small weight, then it seems it is not needed. I did it via R (Rattle), thanks to SanSanych for his article https://www.mql5.com/ru/articles/1165.

input	input_1	input_2	input_3	input_4	input_5	input_6	input_7	input_8	input_9	input_10	input_11	input_12	input_13	input_14	input_15	input_16	input_17	input_18	input_19	input_20
weight	-186.905	7.954625	-185.245	14.88457	-206.037	16.03497	190.0939	23.05248	-182.923	4.268967	196.8927	16.43655	5.419367	8.76542	36.8237	5.940322	8.304859	8.176511	17.13691	-0.57317
subset	yes		yes		yes		yes		yes		yes

I have not tested this approach in practice, I wonder if it worked or not. I would take input_1 input_3 input_5 input_7 input_9 input_11

Случайные леса предсказывают тренды

2014.09.29
СанСаныч Фоменко
www.mql5.com

В статье описано использование пакета Rattle для автоматического поиска паттернов, способных предсказывать "лонги" и "шорты" для валютных пар рынка Форекс. Статья будет полезна как новичкам, так и опытным трейдерам.

Market Watch - Trading Executing Trades - Trading Platform Settings - Getting

Alexey Burnakov 2016.05.28 19:00 #19

Dr.Trader:

I tried to train the neuron on the input data, then looked at the weights. If the input data has a small weight, then it seems that it is not needed. I did it with R (Rattle), thanks to SanSanych for his article https://www.mql5.com/ru/articles/1165.

input	input_1	input_2	input_3	input_4	input_5	input_6	input_7	input_8	input_9	input_10	input_11	input_12	input_13	input_14	input_15	input_16	input_17	input_18	input_19	input_20
weight	-186.905	7.954625	-185.245	14.88457	-206.037	16.03497	190.0939	23.05248	-182.923	4.268967	196.8927	16.43655	5.419367	8.76542	36.8237	5.940322	8.304859	8.176511	17.13691	-0.57317
subset	yes		yes		yes		yes		yes		yes

I have not tested this approach in practice, I wonder if it worked or not. I would take input_1 input_3 input_5 input_7 input_9 input_11

) hmm. it's very interesting.

Clarifying question. Why don't you then include some more inputs where the weight is small, such as 13, 14, 16? Could you show a diagram of inputs and weights ordered by weight?

Sorry, didn't understand at first. Yes these inputs have high modulo weight, as they should be.

Accumulation/Distribution - Volume Indicators Accumulation/Distribution - Volume Indicators Accumulation/Distribution - Volume Indicators

Dr. Trader 2016.05.28 19:18 #20

Visually, all weights are divided into two groups. If you need to divide them according to the principle of significant/non-significant, then 5,11,7,1,3,9 clearly stand out, this set I think is enough.

Setup - Toolbars Indicators - Charts - Real and Generated Ticks

Machine learning in trading: theory, models, practice and algo-trading - page 2