Research in matrix packages - page 2

 
Alexey Burnakov:
I'll post a couple of useful codes on the subject tomorrow.
#  hypothesis testing

#  two-sample mean comparison

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

#  two-sample median comparison

wilcox.test(x, y = NULL,
            alternative = c("two.sided", "less", "greater"),
            mu = 0, paired = FALSE, exact = NULL, correct = TRUE,
            conf.int = FALSE, conf.level = 0.95, ...)

#  two distribution comparison

ks.test(x, y, ...,
        alternative = c("two.sided", "less", "greater"),
        exact = NULL)

#  normality test

shapiro.test(x)


# independence / goodness of fit / homogeneity tests for categorical variables
chisq.test(x, y = NULL, correct = TRUE,
           p = rep(1/length(x), length(x)), rescale.p = FALSE,
           simulate.p.value = FALSE, B = 2000)

#  covariance / correlation

var(x, y = NULL, na.rm = FALSE, use)

cov(x, y = NULL, use = "everything",
    method = c("pearson", "kendall", "spearman"))

cor(x, y = NULL, use = "everything",
    method = c("pearson", "kendall", "spearman"))

# ordinary linear regression
lm(formula, data, subset, weights, na.action,
   method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
   singular.ok = TRUE, contrasts = NULL, offset, ...)   =) that easy
 
Alexey Burnakov:

#  selecting a random subset of rows from an array to perform Shapiro–Wilk test

shapiro.test(x = lateral_residuals$`lateral_linear_model$residuals`[sample(x = nrow(lateral_residuals), size = 4999, replace = F)])
 
Alexey Burnakov:

#  generate pseudo random numbers (or probabilities) for normal

dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)


# for uniform

dunif(x, min = 0, max = 1, log = FALSE)
punif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
qunif(p, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
runif(n, min = 0, max = 1)
 
Alexey Burnakov:
#  perform ANOVA on one or more trained models

anova(object, ...)   that easy =)
 
Alexey Burnakov:
#create histograms with error bars (on first 10 000 rows)

for(i in 1:8){
        
        subdat <- head(pre_an_int_eff, 10000)
        
        dodge <- position_dodge(width = 0.9)
        p <- ggplot(subdat, aes(fill = subdat$'sample_description', y = subdat$'mean', x = subdat$'sample_description')) + 
                geom_bar(position = "dodge", stat = "identity") + 
                geom_bar(position = dodge) + geom_errorbar(aes(ymin = subdat[, 9], ymax = subdat[, 10]), position = dodge, width = 0.25) +
                theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) + 
                theme(legend.position = 'none')
        
        print(p)
        
}
That is all, folks!
 
Alexey Volchanskiy:

By the way, if there are any people who know R, a beginner's question. I see that there are several R distributions, R-server, some "A web application framework for R" http://shiny.rstudio.com/ , monster packages from Microsoft... What to choose?

R-studio is good enough - it is simply an improved interface over the language (any R packages and add-ons from the developer). Shiny is the same R package for creating controls, input forms, all sorts of web demos.
I haven't used it from MS, I can't say.
 

Men!

If you have the slightest prerequisite - programming experience in any language and some knowledge of statistics, then only R, and only R.

Matlab cannot be compared at all - it is a different package, and a paid package for a lot of money.

R's competitors are SAS and SPSS, but they are paid packages and R is beginning to overtake them. For 5 years Matlab was still being compared with R, but I don't see it in the last reviews anymore - it's gone into oblivion.

Nowadays R is the standard for statistics, there are a huge number of publications and in general a very powerful movement.

For example a very useful blog, published every day, you can subscribe for news: http://www.r-bloggers.com/

Here's a bunch of books for very reasonable money: http://www.twirpx.com/search/?query=R. Typed in a search for R. It searches well on keywords.

Let's not forget that R, as an algorithmic programming language, is one of the top ten languages and ranks next to the C variants.

To use it, you must take usual R with RStudio. Besides, let us not forget that the paid variant of R was bought by Microsoft and starts to promote its variant - follow the developments.

R-bloggers
R-bloggers
  • xi'an
  • www.r-bloggers.com
In econometric modeling, I usually have a problem with correlated features. A few weeks ago, I was discussing feature selection when features are correlated. This week, I was wondering about... Clustering French Cities (based on Temperatures) In order to illustrate hierarchical clustering techniques and k-means, I did borrow François Husson‘s...
 
СанСаныч Фоменко:

Men!

If you have the slightest prerequisite - programming experience in any language and some knowledge of statistics, then only R, and only R.

Matlab cannot be compared at all - it is a different package, and a paid package for a lot of money.

R's competitors are SAS and SPSS, but they are paid packages and R is beginning to overtake them. For 5 years Matlab was still being compared with R, but I don't see it in the last reviews anymore - it's gone into oblivion.

Nowadays R is the standard for statistics, there are a huge number of publications and in general a very powerful movement.

For example a very useful blog, published every day, you can subscribe to news: http://www.r-bloggers.com/

Here's a bunch of books for very reasonable money: http://www.twirpx.com/search/?query=R. Typed in a search for R. It searches well on keywords.

Let's not forget that R, as an algorithmic programming language, is one of the top ten languages and ranks next to the C variants.

To use it, you must take usual R with RStudio. Besides let's not forget that the paid R variant was bought by Microsoft and is beginning to promote it.

Well, it's the first day I'm slowly learning R, answer my questions, I want to compare R and Matlab features. But without any hullabaloo, in a balanced and calm manner :).

  1. Is R a language with OOP capabilities?
  2. Can I use R to make a 32-bit and a 64-bit dll for direct use from MQL4/5? If so, what size package must I install to use such a dll on a user's computer?
  3. Can I connect common dlls for direct access from R?
  4. Is there an analogue of Simulink in R?
  5. Why do all the reviews emphasise that R is a statistics program? I am interested in DSP, does R have packages for digital signal processing?
  6. Is there a built-in compact data storage format in R, similar to .mat files in Matlab?

 
Alexey Volchanskiy:

Great, it's my first day of learning R, answer my questions please, I want to compare possibilities of R and Matlab. Only without any chattering, in a balanced and calm way :).


  1. Can I connect common databases for direct access from R?

  2. Why is the emphasis in all the reviews that R is a statistical program? I am interested in DSP, does R have packages for digital signal processing?

Yes and yes. My colleague is clinging to MS SQL.

Signals: https://cran.r-project.org/web/packages/signal/index.html

There are probably other similar packages as well.

R grew out of S. It was originally developed for statistical data processing. Probably, some features of full-fledged languages may be missing in it, but it is convenient to do statistical research in it. And there are many (thousands) open-source packages for data processing and analysis.

Even the latest trends in machine learning - deep learning and the sensational xGBoost- have now been implemented.

 
Alexey Burnakov:

Yes and yes. A colleague of mine is clinging to MS SQL.

Signals: https://cran.r-project.org/web/packages/signal/index.html

There are probably other similar packages as well.

R grew out of S. It was originally developed for statistical data processing. Probably, some features of full-fledged languages may be missing in it, but it is convenient to do statistical research in it. And there are many (thousands) open-source packages for data processing and analysis.

Even the latest trends in machine learning - deep learning and the sensational xGBoost - have now been implemented.

I have to try to translate some of my programs to R in Matlab, to compare the speed. If I manage to figure it out by the weekend, I'll do it and report back. Matlab is pretty slow, I do a lot of stuff in C# or C++ and plug it in as a DLL for speed.
Reason: