Machine learning in trading: theory, models, practice and algo-trading - page 2367

 
mytarmailS:

Doooo long sigh...forgot, tired)

384 GB RAM ?

I don't need that much - 64 is worth it.

 
Aleksey Vyazmikin:

I do not need so much - 64 is worth.

Ok, well, let's see, I'm still sorting out the code myself, how best to do what can be optimized, I think, going over options, I do not want to bother you for nothing, too, I will keep in mind koroch...

 
Aleksey Nikolayev:

Some things, which later quite like, at first seem disgusting - coffee, caviar, wasabi, rock music, etc.)

that's true, I also at first didn't understand some of the structures in p-ka, I thought it was nonsense

I also think that at first I didn't understand some structures in r-ck; I thought it was bullshit, for example I wrote everything via a loop and didn't understand apply, but then it turned out that you can improve readability and speed and make one line of code instead of 6.

 
mytarmailS:

It's true, at first I too did not understand some of the structures in p-ka, I thought it was nonsense

For example I wrote everything through a loop and didn't understand the "apply" family, but then it turned out that you can improve readability and speed and make one line of code with 6

Not only apply. I use foreach more often, you can parallelize it without redoing the code... Sometimes the iterator is useful, try it

library(coro)
abc <- generate_abc()
loop(for (x in abc) print(x))

Good luck

 
Vladimir Perervenko:

Not only apply. I use foreach more often, you can parallelize it without redoing the code... Sometimes the iterator is useful, try it

Good luck

Thank you!

 
mytarmailS:

Thank you!

What is generate_abc ? I still don't understand because the example gives an error

library(coro)
> abc <- generate_abc()
Error in generate_abc() : could not find function "generate_abc"
 

All these operations are in python

print([x for x in range(50)])
 
It all started in lisp and is especially developed in functional programming, elements of which are found in both R and Python.
 
I happened to read an article with a statement that surprised me:Predictors, responses and residuals: What really needs to be normally distributed?

A few quotes:

"Many scientists are concerned about the normality or non-normality of variables in statistical analysis. The following and similar views are often expressed, published, or taught:

  • " If you want to do statistics, then everything has to be normally distributed .
  • " We normalized our data to fit the assumption of normality .
  • " We log-transformed our data because it had a highly skewed distribution . "
  • "After we fitted the model, we checked the homoscedasticity of the residuals .
  • " We used a non-parametric test because our data did not meet the assumption of normality .

And so on. I know it's more complicated than that, but it still seems that the normal distribution is what people want to see everywhere, and that the normal distribution of things opens the door to clean and convincing statistics and strong results. Many people I know routinely check to see if their data are normally distributed before analysis, and then they either try to "normalize" it, for example by using a logarithmic transformation, or they adjust the statistical method accordingly based on the frequency distribution of their data. Here I will explore this more closely and show that there may be fewer assumptions about normality than one might think."

Further justification for the thought and conclusion:

" Why do people still normalize data?

Another puzzling problem is why people still tend to "normalize" their variables (both predictors and responses) before model fitting. Why has this practice emerged and become prevalent, even if there are no assumptions to cause it? I have several theories about this: ignorance, a tendency to follow statistical cookbooks, error propagation, etc. D.
Two explanations seem more plausible: first, people normalize data to linearize relationships. For example, a logarithmic predictor transformation can be used to pick up an exponential function using the usual least squares mechanism. This may seem normal, but then why not specify the nonlinear relationship directly in the model (e.g., using the appropriate reference function)? In addition, the practice of logarithmic response transformation can lead to serious artifacts, such as in the case of zero-count data (O'Hara & Kotze 2010).
A second plausible reason for the "normalization" of the practice was suggested by my colleague Catherine Mertes-Schwartz: it may have to do with researchers trying to solve a problem and their data being collected very slickly and unevenly. In other words, very often one is working with data that has a large number of observations aggregated in a certain part of the gradient, while the other part of the gradient is relatively underrepresented. This leads to distorted distributions. Converting such distributions leads to a seemingly regular distribution of observations along the gradient and the elimination of outliers. In fact, this can be done in good faith. However, this, too, is fundamentally wrong."

For me this statement is (shocking?) , I can't find the right word. But I will keep it in mind.

Predictors, responses and residuals: What really needs to be normally distributed?
Predictors, responses and residuals: What really needs to be normally distributed?
  • www.r-bloggers.com
[This article was first published on Are you cereal? » R , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
 
Maxim Dmitrievsky:

All these operations are in python

It's not about print, it's about generators and iterators.

Reason: