Machine learning in trading: theory, models, practice and algo-trading - page 267

 
mytarmailS:

Well, if instead of adding NA at the end of"Y" and then deleting the same NA, I just delete the last line in SomeData, won't it be the same?

I really don't understand the difference, maybe I'm already overheated for good (

Exactly the same. I just don't do it that way and didn't understand your solution.
 
mytarmailS:
I'm not sorry, but I do not know how, just can not get how many times I've tried, you try your own, which target I did you know, and report back as you do
You pack everything with any packer and attach the file
 
SanSanych Fomenko:
All you pack with any packer and attach the file

Yes tried many times, does not come out, throws out of the forum and all ...

Take my codes, I posted all the same, the first code is the creation of signs, the second training model, you will be exactly the same, do natyaruyte and show what you got

 
mytarmailS:

Yes tried many times, does not work, throws out of the forum and all ...

Take my codes, I posted all , the first code is the creation of signs, the second is training the model, you will be exactly the same, do the training and show what you got

Ok. Let's do it this way. I will try my own. Tomorrow
 
mytarmailS:

When differentiating, the shift is automatic as the series becomes shorter by one element, then all you need is to shorten the sample (table with observations) by the last element

There are two things in the code that make me uneasy:

1) since ohlc is used for the prediction, the very last bar cannot be used for the prediction, since we will make the prediction at the beginning of the bar, and the hlc will change during the lifetime of the entire bar. It turns out that we teach the model using the fully formed last bar, and then in the real world we predict using the unformed one. You can't do that. You should shift the target by 2 bars, not 1.
A shift of target to 1 bar would be acceptable, if you use only open price for prediction, ignoring high, low, close.

2) You are using Close instead of Open for target. This is important for some strategy, or just for fun? After all, we usually enter a trade at the beginning of the bar, and then on the next new bar, we either flip, or leave, or exit. And for the trained model, it is important to predict the price increase from the current open to the next open.
The close of the current bar does not necessarily coincide with the open of the next bar, i.e. by taking close as a target, you can avoid the first error, but maybe get wrong increments instead. I looked it up now, usually close in the price table does not coincide with the Open of the next bar, so your target values are very doubtful.

If the candlestick is overdrawn, you have to get around it.
We take a moving window, the first hundred bars (100 is just a random number, for your code the minimum number of bars = 23, and if less than that the candlestick makes errors), find indicator values, leave only the latest ones, find the appropriate target, add it all to the final table. We shift the window forward by 1 bar, repeat. All this is dozens of times slower than just counting all at once, but it is more reliable. Then you can compare both results and conclude whether or not there is a rollover.

It would be good to change the size of the sliding window, and see if the results will change. If they change - then the indicator uses its own previous values for calculation of new values, in this case the width of the window will work as something like an exponential period inside the indicator itself. If the result will change, then the width of the window should be increased until the results will not change. The indicator may have its own internal limit on the number of bars used, for example zigzag in mt5 has this value = 100 bars, that is, the window width greater than a hundred does not affect the result.

First, try this code to create a training/test table. Compare it with your table, if everything is 100% the same, there is no overriding and you can trust the candlestick to use previous values to identify new ones
Then try to increase indicatorDepth - sliding window width, see if it will change the result, and whether you can find such a value of indicatorDepth, when even increasing it does not affect the result.

if(!require(quantmod)){ install.packages("quantmod", dependencies = TRUE); library(quantmod) }
if(!require(rusquant)){ install.packages("rusquant", repos="http://r-forge.r-project.org", dependencies = TRUE); library(rusquant) }
if(!require(candlesticks)){ install.packages("candlesticks", repos="http://r-forge.r-project.org", dependencies = TRUE); library(candlesticks) }


tryCatch({
  load("SPFB.RTS.rdata")
}, error = function(e){
  getSymbols("SPFB.RTS", src = "Finam", period="5min", from = Sys.Date()-500)
  save(SPFB.RTS, finam.stock.list, file = "SPFB.RTS.rdata")
  SPFB.RTS <<- SPFB.RTS
  finam.stock.list <<- finam.stock.list
})

chart_Series(tail(SPFB.RTS, 100))


indicatorDepth <- 23

tryCatch({
  load("trainData.rdata")
}, error=function(e){
  trainData <<- matrix(NA, ncol=60, nrow = nrow(SPFB.RTS)-2)
  for(i in (indicatorDepth+1):(nrow(SPFB.RTS)-2)){
    X1  <- as.numeric(tail(CandleBodyLength(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X2  <- as.numeric(tail(CandleLength(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X3  <- as.numeric(tail(CSPDarkCloudCover(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X4  <- as.numeric(tail(CSPDoji(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X5  <- as.numeric(tail(CSPEngulfing(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X6  <- as.numeric(tail(CSPGap(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X7  <- as.numeric(tail(CSPHammer(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X8  <- as.numeric(tail(CSPHarami(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X9  <- as.numeric(tail(CSPInsideDay(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X10 <- as.numeric(tail(CSPInvertedHammer(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X11 <- as.numeric(tail(CSPKicking(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X12 <- as.numeric(tail(CSPLongCandle(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X13 <- as.numeric(tail(CSPLongCandleBody(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X14 <- as.numeric(tail(CSPMarubozu(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X15 <- as.numeric(tail(CSPNHigherClose(SPFB.RTS[max(1, i-indicatorDepth):i,],N = 3), 1))
    X16 <- as.numeric(tail(CSPNLowerClose(SPFB.RTS[max(1, i-indicatorDepth):i,],N = 3), 1))
    X17 <- as.numeric(tail(CSPOutsideDay(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X18 <- as.numeric(tail(CSPPiercingPattern(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X19 <- as.numeric(tail(CSPShortCandle(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X20 <- as.numeric(tail(CSPShortCandleBody(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X21 <- as.numeric(tail(CSPStar(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X22 <- as.numeric(tail(CSPStomach(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X23 <- as.numeric(tail(CSPTasukiGap(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X24 <- as.numeric(tail(CSPThreeInside(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X25 <- as.numeric(tail(CSPThreeMethods(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X26 <- as.numeric(tail(CSPThreeOutside(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X27 <- as.numeric(head(tail(nextCandlePosition(SPFB.RTS[max(1, i-indicatorDepth):i,]), 2), 1))
    X28 <- as.numeric(tail(TrendDetectionChannel(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    X29 <- as.numeric(tail(TrendDetectionSMA(SPFB.RTS[max(1, i-indicatorDepth):i,]), 1))
    target <- as.numeric(SPFB.RTS[i+2,"SPFB.RTS.Open"]) - as.numeric(SPFB.RTS[i+1,"SPFB.RTS.Open"])
    trainData[i,] <<- c(X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12,X13,X14,X15,X16,X17,X18,X19,X20,X21,X22,X23,X24,X25,X26,X27,X28,X29,target)
    cat(i, "/", nrow(SPFB.RTS)-2, "\n")
  }
  colnames(trainData)[-ncol(trainData)] <- paste0("pred",1:(ncol(trainData)-1))
  colnames(trainData)[ncol(trainData)] <- "target"
  save(trainData, file="trainData.rdata")
})

# trainData <- trainData[-(1:indicatorDepth),]
 
Dr.Trader:

I am concerned about two points in the code:

1) Since we use ohlc for the forecast, the very last bar cannot be used for prediction, because we will make the prediction at the beginning of the bar and hlc will change during the lifetime of the bar. It turns out that we teach the model using the fully formed last bar, and then in the real world we predict using the unformed one. You can't do that. You should shift the target by 2 bars, not 1.
If you use only open prices for making a forecast ignoring high, low and close, you may shift a target to 1 bar.

I do not understand the problem, we predict the position of the current close relative to the previous close, we do not know the current candle, we know the previous one, because it is already closed, so it is the previous one, then all OHLC prices have already been formed, I do not understand what is the cautiousness, what's the problem

2) You use Close instead of Open for the target. Is it important for some kind of strategy or just for fun? ........

Taken absolutely for nothing, for speed and convenience

If the candlestick is redrawn, you have to get around it........

I do not understand this piece, what does it do? What isSPFB.RTS.rdata?

where did it come from? and why is it overwritten? i do not understand at all(!


tryCatch({
  load("SPFB.RTS.rdata")
}, error = function(e){
  getSymbols("SPFB.RTS", src = "Finam", period="5min", from = Sys.Date()-500)
  save(SPFB.RTS, finam.stock.list, file = "SPFB.RTS.rdata")
  SPFB.RTS <<- SPFB.RTS
  finam.stock.list <<- finam.stock.list
})

And most importantly, you've probably tried to teach something using the data, why do not mention it?

 
mytarmailS:

And most importantly, you must have tried to train something on the data, why do not you say anything about it?

The progress of the training table is 16527 out of 55857. As soon as it is created, I will try to teach it.

mytarmailS:

Predict the position of the current cloze relative to the previous cloze

That's up to you. It's just strange, usually strategies make a decision and open trades at the beginning of a new bar.
And you then need to make a prediction and open a trade just at the end of the bar. Somehow inconvenient, a new bar in the terminal is easy to catch. But "open trade at the end of the current bar, a second before the opening of a new bar, with the hope that the current close price is already final" is too vague for me.

mytarmailS:

I do not understand this piece, what it does? What isSPFB.RTS.rdata?

Downloaded quotes. Download, save in rdata not to rub it every time you run this script and not to wait a second until they are downloaded. And if they were downloaded earlier and saved in rdata file - then from it and take.

 
Dr.Trader:

Thanks

 
Dr.Trader:

There are two things in the code that make me suspicious:

1) because we use ohlc for prediction, we can't use the latest bar for prediction, because we will do the prediction at the beginning of the bar and hlc will change during the lifetime of the bar. It turns out that we teach the model using the fully formed last bar, and then in the real world we predict using the unformed one. You can't do that. You should shift the target by 2 bars, not 1.
A shift of target to 1 bar would be acceptable, if you use only open price for prediction, ignoring high, low, close.

2) You are using Close instead of Open for target. This is important for some strategy, or just for fun? After all, we usually enter a trade at the beginning of the bar, and then on the next new bar, we either flip, or leave, or exit. And for the trained model, it is important to predict the price increase from the current open to the next open.
The close of the current bar does not necessarily coincide with the open of the next bar, i.e. by taking close as a target, you can avoid the first error, but maybe get wrong increments instead. I looked it up now, the close in the price table usually does not coincide with the open of the next bar, so your target values are very doubtful.

It seems to me that you are complicating things.

1. The first problem is closely related to the second. If we use Close as a target, all the other three prices are formed and do not change. When forecasting one step ahead, we should move the target by 1 position.

2. I also cannot accept your arguments on the difference in Close and Open prices. It is on what TF or what day of the week. If we take H1, then we have three different variants:

  • The usual case where these values coincide, or differ by a few pips. This is what percentage of the profit we are trying to take?
  • The case of the gap, which can easily be 100 points. Will it necessarily be between Close and Open? On October 7, there was a huge gap just a few minutes before the end of the hour.
  • the case of Friday into Monday. So that is a special case.

This is just what comes to mind. And there may be many more situations. They all say that real trading is very different from the model. At this point it is more useful to focus on some idealized version, and program the rest of the drags and others separately.

3. Concerning the overriding.

Sacred cow of TA. It is considered a kind of immutable truth for ages.

And on what basis?

For all specialists engaged in ANALYSIS, it is unacceptable to change the data on which they draw their epochal conclusions about the past. Once again, for the past, it is unacceptable.

We are in the business of forecasting, and our view of the past not only can, but must, change based on newly available data. There is a price for data that doesn't redraw: lag.

Here comes a new bar, which is a sign of a market reversal. But we, as we continue to feed the sacred cow, don't change our view of history for some idea taken from the "analysis" section.

We shouldn't be afraid of indicator changes.

A new bar has come in. And on its arrival we need to make decisions under the conditions that this new bar has created. We need to predict the future until the next forecast with minimal error. I haven't seen any publications linking the magnitude of the prediction error to changes in the type of indicators in history.

This is all theoretical talk. We must build a model and get its estimation. That's what I will do

 

Made a small dataset of 5000 prices

The script didn't work correctly

When everything was counted, I got a warning

....
....
....
5677 / 5688
5678 / 5688
5679 / 5688
5680 / 5688
5681 / 5688
5682 / 5688
5683 / 5688
5684 / 5688
5685 / 5688
5686 / 5688
5687 / 5688
5688 / 5688
Warning message:
In readChar(con, 5L, useBytes = TRUE) :
  cannot open compressed file 'trainData.rdata', probable reason 'No such file or directory'

the data

head(trainData)
     <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
     <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
     <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
[6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA
     <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> target
[1,]   NA   NA   NA   NA   NA   NA   NA   NA     NA
[2,]   NA   NA   NA   NA   NA   NA   NA   NA     NA
[3,]   NA   NA   NA   NA   NA   NA   NA   NA     NA
[4,]   NA   NA   NA   NA   NA   NA   NA   NA     NA
[5,]   NA   NA   NA   NA   NA   NA   NA   NA     NA
[6,]   NA   NA   NA   NA   NA   NA   NA   NA     NA


tail(trainData)
                <NA> <NA>         <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[5683,] 8.621061e-05   10 0.0016378604  190    0    1    0    0    0    0    0    0    0
[5684,] 6.036304e-04   70 0.0010346611  120    0    0    0    0    0    1    0    0    0
[5685,] 1.208355e-03  140 0.0018122977  210    0    0    0    0    0    0    0    0    0
[5686,] 6.911447e-04   80 0.0019009764  220    0    0    0    0    0    0    0    0    0
[5687,] 2.592577e-04   30 0.0007778402   90    0    0    0    0    0    0    0    0    0
[5688,] 9.501188e-04  110 0.0016415396  190    0    0    0    0    0    0    0    0    0
        <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[5683,]    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
[5684,]    0    0    1    0    0    0    0    0    0    1    0    0    0    0    0    0
[5685,]    0    0    0    0    0    0    0    1    0    1    0    0    0    0    0    0
[5686,]    0    0    0    0    0    0    0    1    0    1    0    0    0    1    0    0
[5687,]    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
[5688,]    0    0    0    0    0    0    1    0    1    0    0    0    0    0    1    0
        <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[5683,]    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
[5684,]    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
[5685,]    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1
[5686,]    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
[5687,]    1    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
[5688,]    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
        <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> target
[5683,]   NA   NA   NA   NA   NA   NA    0    1    0    0    0    0    1   -1    -70
[5684,]   NA   NA   NA   NA   NA   NA    0    1    0    0    0    0    1   -1   -140
[5685,]   NA   NA   NA   NA   NA   NA    0    0    1   -1    0    0    1   -1    -90
[5686,]   NA   NA   NA   NA   NA   NA    0    0    1   -1    0    0    1   -1     20
[5687,]   NA   NA   NA   NA   NA   NA    0    0    1   -1    0    0    1   -1    100
[5688,]   NA   NA   NA   NA   NA   NA    0    0    1   -1    0    0    1   -1     50

The data is always present in the NA-sheets, although I do not exclude that I did it myself

Reason: