Machine learning in trading: theory, models, practice and algo-trading - page 3231

 

There is a site that records the publication of new R packages....

I have always been interested in downloading this data and aggregate it by time (how many new packages are downloaded per day), well to understand what is going on with the language....

But I always put it off, I didn't know how to do it, I thought it was long and complicated....

Today I decided to do it, I spent about 5 minutes to realise how to do it and 30 seconds on the code itself)))

here is the code

library(rvest)
library(xts)
url <- "https://cran.r-project.org/web/packages/available_packages_by_date.html"

tb <- url |> read_html() |> html_table() |> _[[1]] |> {\(.) .[nrow(.):1,]}()

tb$Date |> rle() |> _$lengths |> xts(as.POSIXct(r$values)) |> plot(main="количество новых пакетов",col=4)

downloading data from the site + aggregation by day + visualisation.


2008-09-08                                           1
2008-10-28                                           1
2010-06-25                                           1
2010-07-07                                           1
2011-08-18                                           1
2011-09-07                                           1
2011-12-01                                           1
2011-12-28                                           1
2012-01-28                                           1
2012-03-01                                           1
       ...                                            
2023-09-01                                          44
2023-09-02                                          35
2023-09-03                                          37
2023-09-04                                          32
2023-09-05                                          72
2023-09-06                                          84
2023-09-07                                          58
2023-09-08                                          45
2023-09-09                                          26
2023-09-10                                          27

I get an average of 40-50 new packages per day.

=========================================================================


And here is the same code in the praised python, which is the best for parsing and in general

import pandas as pd
import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt

url = "https://cran.r-project.org/web/packages/available_packages_by_date.html"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]
df = df.iloc[::-1]

df['Date'] = pd.to_datetime(df['Date'])
df_grouped = df.groupby('Date').size()

df_grouped.plot(title="количество новых пакетов")
plt.show()

Date
2008-09-08     1
2008-10-28     1
2010-06-25     1
2010-07-07     1
2011-08-18     1
              ..
2023-09-06    84
2023-09-07    58
2023-09-08    45
2023-09-09    26
2023-09-10    29
Length: 2807, dtype: int64

===========================================================


7 lines of code in R, 18 lines of code in python...

 
Renat Fatkhullin #:

Essentially everything will be as we work out the terms of the competition - it's a huge amount of work.

We will wait for detailed conditions.

However, most of the developments people have their own, which cannot be fitted to standards/templates for everyone. And the most valuable things are not models, but predictors.

If the predictors are a fixed set, the task of training will be reduced to selecting the significant ones. But if it is possible to adjust the settings of these predictors, it will be more fun.

 
Andrey Dik #:

ZZY**. "the evening stops being languid" - very interesting what will come out of it, after all you do care, fxsaber, don't you? - and so do I))))))))))))))))))))))))))

Probably one of the stupidest things is to try to change another person's mind. No amount of argument works. Hence the other "P" word - bygones.

 
fxsaber #:

Probably one of the stupidest things is to try to change another person's mind. No amount of argument works. That's why the other "P" word is bygones.

their contest == their terms.

That's a silly thing to argue with.

 

I'll have to increase the amount of effort to be able to install packages with acceleration

import requests, pandas as pd, matplotlib.pyplot as plt
from bs4 import BeautifulSoup

df = pd.read_html(requests.get("https://cran.r-project.org/web/packages/available_packages_by_date.html").text, flavor='bs4')[0].iloc[::-1]
df.groupby('Date').size().plot(title="количество новых пакетов")
plt.show()
Once again you've been sucked in, just smeared lines across the screen and passed it off as truth.
 
Maxim Dmitrievsky #:

I'll have to increase the amount of effort to be able to install packages at an accelerated rate

Once again I'm screwed, I just smeared the lines on the screen and passed it off as truth.
If I compress it in such a useless way as you did, then I'll have 2 lines together with the library declaration....

So you're out of luck here too.
 
mytarmailS #:
If I compress in such a non-usable way as you did, then I will have 2 lines in total together with the library declaration...

So you're out of luck here too.

Absolutely usable way and perfect understandable syntax, and you learn the alphabet of the predator to write code in curlicues, which are not even on the keyboard )).

As a result, you can't even write a loop without errors in other languages, because you are used to hieroglyphs.

 

One can only welcome the organisation of the championship with the aim of popularising the MoD.

It would seem.

However, the technical framework of the championship, which is python and onnx, leaves out of the championship the true diversity of models available within MO.

Hundreds of models are left out. It is these models that define the meaning of the word "machine learning", and the various neural networks are a small and not the most interesting part of MO for trading.

I attach a rather old list (2015) of models available within the caret shell, i.e. available within the train operator. The composition of some groups of models is not disclosed, as the list reflects my tastes.

 
It's a barrier against a cult of madmen.
 
Maxim Dmitrievsky #:
This is a barrier from the sect of madmen

It is also a barrier from homemade models, including in MQL5.

If there were a converter from a tree (for example, from conditions if(if{ if{...}else{...})else{ if{...}else{...}) to ONNX, maybe I would participate. }) in ONNX, maybe I would participate. And so only on standard models that have a converter, but we can also think about it..... The prizes are not bad, maybe I'll make something on Catbusta.

Reason: