How to count labels vs features on a large dataset?

Vladimir Perervenko 2022.10.26 21:41 #28011

For control, I tested on my kit using this script. Result:

# patch <- "C:/RData/Project/FEDOT/"
# df1 <- fread(paste0(patch, "DF_train_M5.csv"))
# object.size(df1) #780184 bytes
# dim(df1) #[1] 4030   25
# ft <- as_fst(df1)#
# rm(df1)
#ft %>% select_fst(cols = c(1:3,25), negate = TRUE)-> dt
#dim(dt) [1] 4030   21
bench::workout({
    for(i in seq_len(length(cor.test.range))){
        get.findCor(dt, cor.coef = cor.test.range[i])-> dt.n
        paste0("train2_" , cor.test.range[i]*10 , ".csv") %>%
            paste0(patch , .) %>% fwrite(dt.n, .)
        rm(dt.n)
    }
})->t1 #(12.9  m)
setwd(patch)
dim(fread("train2_1.csv"))
#[1] 4030    3
dim(fread("train2_2.csv"))
#[1] 4030    6
dim(fread("train2_3.csv"))
#[1] 4030   10
dim(fread("train2_4.csv"))
#[1] 4030   13
dim(fread("train2_5.csv"))
#[1] 4030   16
dim(fread("train2_6.csv"))
#[1] 4030   17
dim(fread("train2_7.csv"))
#[1] 4030   18
dim(fread("train2_8.csv"))
#[1] 4030   18
dim(fread("train2_9.csv"))
#[1] 4030   18

It counts for quite a long time (12.9 min). But the frame is not small either. Of course we need to parallelise it, look for a faster cor function.

From the initial 21 predictors with different thresholds we selected different numbers of predictors.

But this is not the way to go.

Good luck

Limit trades in a Multiple Same Indicator on Well working Trading System

[Deleted] 2022.10.27 00:54 #28012

СанСаныч Фоменко #:

You didn't pay attention to the variability of sd

I'll pay attention next time, I'll count sd from sd from sd from sd %)

[Deleted] 2022.10.27 05:30 #28013

Binding the offset of the feature window to some indicator (e.g. std) did not yield anything

the larger the value, the larger the offset multiple of this value.

or vice versa. Tried both.

there is also a variant of expansion-narrowing (+ offset?), I haven't tried it yet.

I can only see an enumeration of such variants within the framework of fractals.

SetIndexBuffer() FOREX - Trends, forecasts Spread trading in Meta

mytarmailS 2022.10.27 09:35 #28014

Vladimir Perervenko #:

1. You don't need to load the "caret" package into the global scope. It is very heavy, pulling a lot of dependencies and data. You only need one function of it. You import it directly into the get.findCor function.

Wow, it's hollow

Vladimir Perervenko #:

Vladimir, do you know if there is a package for backtest that keeps a log of transactions and everything (well, not to be primitive), except for slow "quantstrat" and "SIT"

[Archive] Learn how to Multicurrency - multitime advisor Something interesting, old thread

СанСаныч Фоменко 2022.10.27 09:37 #28015

Maxim Dmitrievsky #:

Binding the offset of the feature window to some indicator (e.g. std) did not yield anything

the larger the value, the larger the offset multiple of this value.

or vice versa. I tried both.

there is also an option of expansion-narrowing (+ offset?), I haven't tried it yet.

I can only see an overshoot of such variants within the framework of fractals

Absolutely.

Everything has to be recalculated at each step.

[Deleted] 2022.10.27 09:45 #28016

СанСаныч Фоменко #:

Absolutely.

Everything has to be recalculated at every step.

I find it easier to go through the ways of recalculating labels vs features on a large dataset than to retrain every bar, I'll be stuck for a long time.

And by frequent retraining you determine some general pattern, if you look at it globally. Unless this design is pouring, of course.

Advisors on neural networks, The market is a What to feed to

СанСаныч Фоменко 2022.10.27 10:05 #28017

Maxim Dmitrievsky #:

It's easier for me to go over ways to recalculate labels vs chips on a large dataset than to retrain every bar, I' d be stuck for a long time

Totally agree, this is the reason why I can't switch to EA.

But it's a matter of principle. I switched to the scheme "teach every step" because of the hidden looking ahead, which arise due to the preparation of the whole dataset. I have this exact problem, and I have not been able to find predictors that generate the "looking ahead" effect.

Atr StringToEnum a trading strategy based

Aleksei Kuznetsov 2022.10.27 10:28 #28018

СанСаныч Фоменко #:

Totally agree, this is the reason why it fails to go to EA.

But it's a povros of principle. I switched to the "teach at every step" scheme because of the hidden peeking ahead that arise due to the preparation of the whole dataset. I have this exact problem, and I have not been able to find predictors that generate the "looking ahead" effect.

Make a gap between the training section and the test. At least a couple of days. The last bars have the same future as the first unknown.
The Embargo plot is sort of called.
Once training reduced to 1 day and the test was 1 day. And watched the forecast for the markup a few days ahead. I.e. it saw what will be for new bars. Very good results were.
With increasing the training interval up to a week, the result was also higher than 50/50. Well, the more - the worse, the lines with peeking were added to the lines without peeking and they spoilt everything))))
.

In general this embargo plot should be no less than peek-ahead for the teacher.

Is there a pattern Discussion of article "Deep Ward 6

Vladimir Perervenko 2022.10.27 11:14 #28019

mytarmailS #:

Wow, that's a hollow thing

Vladimir, do you know if there is a package for backtest that keeps a log of transactions and everything (well, not to be primitive), except for the slow "quantstrat" and "SIT".

I don't know. I have not met

Aleksey Vyazmikin 2022.10.27 13:47 #28020

mytarmailS #:
I don't know, maybe you changed the separator or something.

I still don't understand what error the script gives

And why you installed the packages on the newer R and you're using the old R.

No, I didn't.

Well, I attached the logs in that post - I couldn't make sense of the error either.

Because this is how R works, then the script needs an older version, then a newer one - very inconvenient - no normal backwards compatibility even.

mytarmailS #:

Here try it, I had to rewrite it all over again, the code was so shitty that I didn't understand what it did.

Thanks. But where to specify the path with the files? In the previous script it was clear where the path was written - here it is not clear, besides the trick of it was in the presence of a loop.

Converting MSDN C++ functions free dukacospy historical data Write logging data to

Machine learning in trading: theory, models, practice and algo-trading - page 2802