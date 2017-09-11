DNN are widely used and intensely developed in many areas. The most common examples of everyday use of neural networks are speech and image recognition and automatic translation from one language into another. DNN are also used in trading. Given the fast development of algorithmic trading, in-depth studying of DNN seems to be useful.

Lately, developers have come up with many new ideas, methods and approaches to the use of DNN and proved them experimentally. This series of articles will consider the state and the main directions of the development of DNN. A lot of space will be dedicated to testing various ideas and methods using practical experiments alongside qualitative characteristics of DNN. In our work we will be using only multilayer fully connected networks.

The articles will have four focus areas:

Preparation, evaluation and amplification of the entry data by various transformations.



New capabilities of the darch package (v.0.12). Flexibility and extended functionality.

The use of prediction result amplification (optimization of hyperparameters of DNN and ensembles of neural networks).

Graphical capabilities for controlling the work of an Expert Advisor both during learning and work.

This article will consider preparing data received in the trading terminal for use in the neural network.

Introduction

Development, training and testing of a deep neural network are done in stages that have a strict sequence. Similar to any model of machine learning, the process of creating a DNN can be split into two unequal parts:

preparing input and output data for experiments;

creating, training, testing and optimizing parameters of the DNN.

The first stage takes a bigger part of the project time - about 70%. The work of a DNN largely depends on the success of the this stage. After all, garbage in - garbage out. This is why we will describe the sequence of actions at this stage in detail.

To repeat the experiments, you will need to install MRO 3.4.0 and Rstudio. Instructions for installing this software can be easily found on the internet. Files attached to this article contain this information too so we are not going to consider it in detail.

The R language

Let us recall some important things about R. This is a programming language and environment for statistical computing and graphics. It was developed in 1996 by New Zealand scientists Ross Ihaka and Robert Gentleman at the University of Auckland. R is a GNU project, that is open source software. The approach to using open source software goes down to the following principles (freedoms):

the freedom to launch programs for any purpose (freedom 0);

the freedom to study how the program works and adapt it to the programmer's needs (freedom 1);

the freedom to distribute copies so you can help your neighbor (freedom 2);

the freedom to improve the program and distribute the modified version to benefit the whole community with the change.

Today R is being improved and developed mainly by "R Development Core Team" and R Consortium founded last year. The list of the members of the consortium (IBM, Microsoft, Rstudio, Google, Mango, Oracle and others) indicates good support, significant interest and good prospects of the language.

Advantages of R:

Today, R is the standard in statistical computing.

It is supported and developed by the world's scientific community.

A wide set of packages concerning all advanced direction in data mining. It must be mentioned that the time between a publication of a new idea by scientists and the implementation of this idea in R is no more than two weeks.

And, last but not least, it is absolutely free.



1. Creating initial (raw) data

"All of the previous, current and future price movements are in the price itself"

There are many methods (packages) designed for preliminary preparation, evaluation and choosing of the predictors. The review of these methods can be found in [1]. Their variety is explained by the diversity of real world data. The data type in use will define the methods of exploring and processing.

We are exploring financial data. These are hierarchical, regular timeseries which are infinite and can be easily extracted. The base row is the OHLCV quotes for the instrument on the specific timeframe.

All other timeseries come from this base row:

nonparametric. For example, x^2, sqrt(abs(x)), x^3, -x^2 etc.

functional nonparametric. For example, sin(2*n*x), ln(abs(x)), log(Pr(t)/Pr(t-1)) etc.

parametric. Here belongs a number of various indicators, which are mainly used as predictors. They can be both oscillators and different sorts of filter.

Either indicators generating signals (factors) or a sequence of conditional statements producing a signal can be used as the goal variable.

1.1. Quotes

The OHLC quotes, Volume and time we get from the terminal as the (o, h, l, cl, v, d) vectors. We need to write a function that will join vectors received from the terminal in dataFrame. For that, we will change the format of the start time of the bar to the POSIXct format.

#---pr.OHLCV------------------- pr.OHLCV <- function(d, o, h, l, cl, v){ # (d, o, h, l, cl, v)- vector require('magrittr') require('dplyr') require('anytime') price <- cbind(Data = rev(d), Open = rev(o), High = rev(h), Low = rev(l), Close = rev(cl), Vol = rev(v)) %>% as.tibble() price$Data %<>% anytime(., tz = "CET") return (price) }

As the quote vectors have been loaded in the environment env, let us calculate the dataFrame pr and clear the environment env from unused variables:

evalq({pr <- pr.OHLCV(Data, Open , High , Low , Close , Volume ) rm(list = c("Data", " Open ", " High ", " Low ", " Close ", " Volume ")) }, env)

> head(env$pr) # A tibble: 6 x 6 Data Open High Low Close < dttm > < dbl > < dbl > < dbl > < dbl > 1 2017-01-10 11:00:00 122.758 122.893 122.746 122.859 2 2017-01-10 11:15:00 122.860 122.924 122.818 122.848 3 2017-01-10 11:30:00 122.850 122.856 122.705 122.720 4 2017-01-10 11:45:00 122.721 122.737 122.654 122.693 5 2017-01-10 12:00:00 122.692 122.850 122.692 122.818 6 2017-01-10 12:15:00 122.820 122.937 122.785 122.920 # ... with 1 more variables: Vol < dbl >

We want to see how this dataFrame looks at the beginning:

and at the end:

> tail(env$pr) # A tibble: 6 x 6 Data Open High Low Close < dttm > < dbl > < dbl > < dbl > < dbl > 1 2017-05-05 20:30:00 123.795 123.895 123.780 123.888 2 2017-05-05 20:45:00 123.889 123.893 123.813 123.831 3 2017-05-05 21:00:00 123.833 123.934 123.825 123.916 4 2017-05-05 21:15:00 123.914 123.938 123.851 123.858 5 2017-05-05 21:30:00 123.859 123.864 123.781 123.781 6 2017-05-05 21:45:00 123.779 123.864 123.781 123.781 # ... with 1 more variables: Vol < dbl >

So, there are 8000 bars with the start date 10.01.2017 and the end date 05.05.2017. Let us add derivatives of the price to the dataframe pr — Medium Price, Typical Price and Weighted Close

evalq(pr %<>% mutate(., Med = ( High + Low )/ 2 , Typ = ( High + Low + Close )/ 3 , Wg = ( High + Low + 2 * Close )/ 4 , #CO = Close - Open , #HO = High - Open , #LO = Low - Open , dH = c(NA, diff( High )), dL = c(NA, diff( Low )) ), env)

1.2. Predictors

We are going to work with a set of simplified predictors. Digital filters FATL, SATL, RFTL, RSTL will play that role. They are described in detail in the article by V. Kravchuk "New Adaptive Method of Following the Tendency and Market Cycles", which can be found in the files attached to this article (see the chapter "New Tools of Technical Analysis and their Interpretation"). Here I will just list them.

FATL (Fast Adaptive Trend Line) ;

; SATL (Slow Adaptive Trend Line) ;

; RFTL (Reference Fast Trend Line) ;

; RSTL (Reference Slow Trend Line).



The rate of change of FATL and SATL can be monitored using the FTLM (Fast Trend Line Momentum) and STLM (Slow Trend Line Momentum) indicators.

There are two oscillators among the technical tools that we will need - indices RBCI and PCCI. The RBCI (Range Bound Channel Index) index is a bandwidth limited channel index which is calculated by means of a channel filter. The filter removes low frequency trend and low frequency noise. The PCCI (Perfect Commodity Channel Index) index is a perfect commodity channel index.

The function calculating digital filters FATL, SATL, RFTL, RSTL looks as follows:

#-----DigFiltr------------------------- DigFiltr <- function(X, type = 1 ){ # X - vector require(rowr) fatl <- c( + 0.4360409450 , + 0.3658689069 , + 0.2460452079 , + 0.1104506886 , - 0.0054034585 , - 0.0760367731 , - 0.0933058722 , - 0.0670110374 , - 0.0190795053 , + 0.0259609206 , + 0.0502044896 , + 0.0477818607 , + 0.0249252327 , - 0.0047706151 , - 0.0272432537 , - 0.0338917071 , - 0.0244141482 , - 0.0055774838 , + 0.0128149838 , + 0.0226522218 , + 0.0208778257 , + 0.0100299086 , - 0.0036771622 , - 0.0136744850 , - 0.0160483392 , - 0.0108597376 , - 0.0016060704 , + 0.0069480557 , + 0.0110573605 , + 0.0095711419 , + 0.0040444064 , - 0.0023824623 , - 0.0067093714 , - 0.0072003400 , - 0.0047717710 , 0.0005541115 , 0.0007860160 , 0.0130129076 , 0.0040364019 ) rftl <- c(- 0.0025097319 , + 0.0513007762 , + 0.1142800493 , + 0.1699342860 , + 0.2025269304 , + 0.2025269304 , + 0.1699342860 , + 0.1142800493 , + 0.0513007762 , - 0.0025097319 , - 0.0353166244 , - 0.0433375629 , - 0.0311244617 , - 0.0088618137 , + 0.0120580088 , + 0.0233183633 , + 0.0221931304 , + 0.0115769653 , - 0.0022157966 , - 0.0126536111 , - 0.0157416029 , - 0.0113395830 , - 0.0025905610 , + 0.0059521459 , + 0.0105212252 , + 0.0096970755 , + 0.0046585685 , - 0.0017079230 , - 0.0063513565 , - 0.0074539350 , - 0.0050439973 , - 0.0007459678 , + 0.0032271474 , + 0.0051357867 , + 0.0044454862 , + 0.0018784961 , - 0.0011065767 , - 0.0031162862 , - 0.0033443253 , - 0.0022163335 , + 0.0002573669 , + 0.0003650790 , + 0.0060440751 , + 0.0018747783 ) satl <- c(+ 0.0982862174 , + 0.0975682269 , + 0.0961401078 , + 0.0940230544 , + 0.0912437090 , + 0.0878391006 , + 0.0838544303 , + 0.0793406350 ,+ 0.0743569346 ,+ 0.0689666682 , + 0.0632381578 ,+ 0.0572428925 , + 0.0510534242 ,+ 0.0447468229 , + 0.0383959950 , + 0.0320735368 , + 0.0258537721 ,+ 0.0198005183 , + 0.0139807863 ,+ 0.0084512448 , + 0.0032639979 , - 0.0015350359 , - 0.0059060082 ,- 0.0098190256 , - 0.0132507215 , - 0.0161875265 , - 0.0186164872 , - 0.0205446727 , - 0.0219739146 ,- 0.0229204861 , - 0.0234080863 ,- 0.0234566315 , - 0.0231017777 , - 0.0223796900 , - 0.0213300463 ,- 0.0199924534 , - 0.0184126992 ,- 0.0166377699 , - 0.0147139428 , - 0.0126796776 , - 0.0105938331 ,- 0.0084736770 , - 0.0063841850 ,- 0.0043466731 , - 0.0023956944 , - 0.0005535180 , + 0.0011421469 ,+ 0.0026845693 , + 0.0040471369 ,+ 0.0052380201 , + 0.0062194591 , + 0.0070340085 , + 0.0076266453 ,+ 0.0080376628 , + 0.0083037666 ,+ 0.0083694798 , + 0.0082901022 , + 0.0080741359 , + 0.0077543820 ,+ 0.0073260526 , + 0.0068163569 ,+ 0.0062325477 , + 0.0056078229 , + 0.0049516078 , + 0.0161380976 ) rstl <- c(- 0.0074151919 ,- 0.0060698985 ,- 0.0044979052 ,- 0.0027054278 ,- 0.0007031702 ,+ 0.0014951741 , + 0.0038713513 ,+ 0.0064043271 ,+ 0.0090702334 ,+ 0.0118431116 ,+ 0.0146922652 ,+ 0.0175884606 , + 0.0204976517 ,+ 0.0233865835 ,+ 0.0262218588 ,+ 0.0289681736 ,+ 0.0315922931 ,+ 0.0340614696 , + 0.0363444061 ,+ 0.0384120882 ,+ 0.0402373884 ,+ 0.0417969735 ,+ 0.0430701377 ,+ 0.0440399188 , + 0.0446941124 ,+ 0.0450230100 ,+ 0.0450230100 ,+ 0.0446941124 ,+ 0.0440399188 ,+ 0.0430701377 , + 0.0417969735 ,+ 0.0402373884 ,+ 0.0384120882 ,+ 0.0363444061 ,+ 0.0340614696 ,+ 0.0315922931 , + 0.0289681736 ,+ 0.0262218588 ,+ 0.0233865835 ,+ 0.0204976517 ,+ 0.0175884606 ,+ 0.0146922652 , + 0.0118431116 ,+ 0.0090702334 ,+ 0.0064043271 ,+ 0.0038713513 ,+ 0.0014951741 ,- 0.0007031702 , - 0.0027054278 ,- 0.0044979052 ,- 0.0060698985 ,- 0.0074151919 ,- 0.0085278517 ,- 0.0094111161 , - 0.0100658241 ,- 0.0104994302 ,- 0.0107227904 ,- 0.0107450280 ,- 0.0105824763 ,- 0.0102517019 , - 0.0097708805 ,- 0.0091581551 ,- 0.0084345004 ,- 0.0076214397 ,- 0.0067401718 ,- 0.0058083144 , - 0.0048528295 ,- 0.0038816271 ,- 0.0029244713 ,- 0.0019911267 ,- 0.0010974211 ,- 0.0002535559 , + 0.0005231953 ,+ 0.0012297491 ,+ 0.0018539149 ,+ 0.0023994354 ,+ 0.0028490136 ,+ 0.0032221429 , + 0.0034936183 ,+ 0.0036818974 ,+ 0.0038037944 ,+ 0.0038338964 ,+ 0.0037975350 ,+ 0.0036986051 , + 0.0035521320 ,+ 0.0033559226 ,+ 0.0031224409 ,+ 0.0028550092 ,+ 0.0025688349 ,+ 0.0022682355 , + 0.0073925495 ) if (type == 1 ) {k = fatl} if (type == 2 ) {k = rftl} if (type == 3 ) {k = satl} if (type == 4 ) {k = rstl} n <- length(k) m <- length(X) k <- rev(k) f <- rowr::rollApply(data = X, fun = function(x) {sum(x * k)}, window = n, minimum = n, align = "right" ) while (length(f) < m) { f <- c(NA,f)} return (f) }

After they have been calculated, add them to the dataframe pr

evalq(pr %<>% mutate(., fatl = DigFiltr( Close , 1 ), rftl = DigFiltr( Close , 2 ), satl = DigFiltr( Close , 3 ), rstl = DigFiltr( Close , 4 ) ), env)

Add oscillators FTLM, STLM, RBCI, PCCI, their first differences and the first differences of the digital filters to the dataframe pr:

evalq(pr % < > % mutate(., ftlm = fatl - rftl, rbci = fatl - satl, stlm = satl - rstl, pcci = Close - fatl, v.fatl = c(NA, diff(fatl)), v.rftl = c(NA, diff(rftl)), v.satl = c(NA, diff(satl)), v.rstl = c(NA, diff(rstl)*10) ), env) evalq(pr % < > % mutate(., v.ftlm = c(NA, diff(ftlm)), v.stlm = c(NA, diff(stlm)), v.rbci = c(NA, diff(rbci)), v.pcci = c(NA, diff(pcci)) ), env)

1.3. Goal variable

ZigZag() will be used as the indicator generating the goal variable.

The function for its calculation will receive the timeseries and two parameters: a minimal length of a bend (int or double) and the price type for calculation (Close, Med, Typ, Wd, with (High, Low) ).

#------ZZ----------------------------------- par <- c( 25 , 5 ) ZZ <- function(x, par) { # x - vector require(TTR) require(magrittr) ch = par[ 1 ] mode = par[ 2 ] if (ch > 1 ) ch <- ch/( 10 ^ (Dig - 1 )) switch (mode, xx <- x$ Close , xx <- x$Med, xx <- x$Typ, xx <- x$Wd, xx <- x %>% select( High , Low )) zz <- ZigZag(xx, change = ch, percent = F, retrace = F, lastExtreme = T) n <- 1 :length(zz) for (i in n) { if (is.na(zz[i])) zz[i] = zz[i - 1 ]} return (zz) }

Calculate ZigZag, the first difference, the sign of the first difference and add them to the dataframe pr:

evalq(pr % < > % cbind(., zigz = ZZ(., par = par)), env) evalq(pr % < > % cbind(., dz = diff(pr$zigz) %>% c(NA, .)), env) evalq(pr % < > % cbind(., sig = sign(pr$dz)), env)

1.4.Initial data set

Let us sum up what data we should have as a result of calculations.

We received from the terminal the OHLCV vectors and a temporary mark of the beginning of the bar on the M15 timeframe for EURJPY. These data formed the pr dataframe. Variables FATL, SATL, RFTL, RSTL, FTLM, STLM, RBCI, PCCI and their first differences were added to this dataframe. ZigZag with a minimal leverage of 25 points (4 decimal places), its first difference and the sign of the first difference (-1,1), which will be used as a signal, were added to the dataframe too.

All these data were loaded not into the global environment but into a new child environment env, where all the calculations will be carried out. This division will allow using data sets from different symbols or timeframes without name conflicts during calculation.

The structure of the total dataframe pr is shown below. Variables, required for the following calculations can be easily extracted from this.

str(env$pr) 'data.frame': 8000 obs. of 30 variables: $ Data : POSIXct, format: " 2017 - 01 - 10 11 : 00 : 00 " ... $ Open : num 123 123 123 123 123 ... $ High : num 123 123 123 123 123 ... $ Low : num 123 123 123 123 123 ... $ Close : num 123 123 123 123 123 ... $ Vol : num 3830 3360 3220 3241 3071 ... $ Med : num 123 123 123 123 123 ... $ Typ : num 123 123 123 123 123 ... $ Wg : num 123 123 123 123 123 ... $ dH : num NA 0.031 - 0.068 - 0.119 0.113 ... $ dL : num NA 0.072 - 0.113 - 0.051 0.038 ... $ fatl : num NA NA NA NA NA NA NA NA NA NA ... $ rftl : num NA NA NA NA NA NA NA NA NA NA ... $ satl : num NA NA NA NA NA NA NA NA NA NA ... $ rstl : num NA NA NA NA NA NA NA NA NA NA ... $ ftlm : num NA NA NA NA NA NA NA NA NA NA ... $ rbci : num NA NA NA NA NA NA NA NA NA NA ... $ stlm : num NA NA NA NA NA NA NA NA NA NA ... $ pcci : num NA NA NA NA NA NA NA NA NA NA ... $ v.fatl: num NA NA NA NA NA NA NA NA NA NA ... $ v.rftl: num NA NA NA NA NA NA NA NA NA NA ... $ v.satl: num NA NA NA NA NA NA NA NA NA NA ... $ v.rstl: num NA NA NA NA NA NA NA NA NA NA ... $ v.ftlm: num NA NA NA NA NA NA NA NA NA NA ... $ v.stlm: num NA NA NA NA NA NA NA NA NA NA ... $ v.rbci: num NA NA NA NA NA NA NA NA NA NA ... $ v.pcci: num NA NA NA NA NA NA NA NA NA NA ... $ zigz : num 123 123 123 123 123 ... $ dz : num NA - 0.0162 - 0.0162 - 0.0162 - 0.0162 ... $ sig : num NA - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 ...

Select all predictors calculated previously from the dataSet dataframe. Convert the goal variable sig into a factor and move one bar forward (into the future).

evalq(dataSet <- pr %>% tbl_df() %>% dplyr::select(Data, ftlm, stlm, rbci, pcci, v.fatl, v.satl, v.rftl, v.rstl, v.ftlm, v.stlm, v.rbci, v.pcci, sig) %>% dplyr::filter(., sig != 0 ) %>% mutate(., Class = factor(sig, ordered = F) %>% dplyr::lead()) %>% dplyr::select(-sig), env)

Visualizing data analysis

Draw the OHLC chart using the ggplot2 package. Take the data for the last two days and draw a chart of quotes in bars.

evalq(pr %>% tail(., 200 ) %>% ggplot(aes(x = Data, y = Close )) + geom_candlestick(aes(open = Open , high = High , low = Low , close = Close )) + labs(title = "EURJPY Candlestick Chart", y = " Close Price", x = "") + theme_tq(), env)

Fig.1. Chart of quotes

Draw the FATL, SATL, RFTL, RSTL and ZZ:chart

evalq(pr %>% tail(., 200 ) %>% ggplot(aes(x = Data, y = Close )) + geom_candlestick(aes(open = Open , high = High , low = Low , close = Close )) + geom_line(aes(Data, fatl), color = "steelblue", size = 1 ) + geom_line(aes(Data, rftl), color = "red", size = 1 ) + geom_line(aes(Data, satl), color = "gold", size = 1 ) + geom_line(aes(Data, rstl), color = "green", size = 1 ) + geom_line(aes(Data, zigz), color = "black", size = 1 ) + labs(title = "EURJPY Candlestick Chart", subtitle = "Combining Chart Geoms", y = " Close Price", x = "") + theme_tq(), env)

Fig.2. FATL, SATL, RFTL, RSTL and ZZ

Split oscillators into three groups for more convenient representation.

require(dygraphs) evalq(dataSet %>% tail(., 200 ) %>% tk_tbl %>% select(Data, ftlm, stlm, rbci, pcci) %>% tk_xts() %>% dygraph(., main = "Oscilator base" ) %>% dyOptions(., fillGraph = TRUE, fillAlpha = 0.2 , drawGapEdgePoints = TRUE, colors = c( "green" , "violet" , "red" , "blue" ), digitsAfterDecimal = Dig) %>% dyLegend(show = "always" , hideOnMouseOut = TRUE), env)

Fig.3. Base oscillators

evalq(dataSet %>% tail(., 200 ) %>% tk_tbl %>% select(Data, v.fatl, v.satl, v.rftl, v.rstl) %>% tk_xts() %>% dygraph(., main = "Oscilator 2" ) %>% dyOptions(., fillGraph = TRUE, fillAlpha = 0.2 , drawGapEdgePoints = TRUE, colors = c( "green" , "violet" , "red" , "darkblue" ), digitsAfterDecimal = Dig) %>% dyLegend(show = "always" , hideOnMouseOut = TRUE), env)

Fig.4. Oscillators of the second group

Oscillators of the third group will be drawn on the last 100 bars:

evalq(dataSet %>% tail(., 100 ) %>% tk_tbl %>% select(Data, v.ftlm, v.stlm, v.rbci, v.pcci) %>% tk_xts() %>% dygraph(., main = "Oscilator 3" ) %>% dyOptions(., fillGraph = TRUE, fillAlpha = 0.2 , drawGapEdgePoints = TRUE, colors = c( "green" , "violet" , "red" , "darkblue" ), digitsAfterDecimal = Dig) %>% dyLegend(show = "always" , hideOnMouseOut = TRUE), env)

Fig.5. Oscillators of the third group

