Machine learning in trading: theory, models, practice and algo-trading - page 1188

 
Dmitriy Skub:
Yuri, you're just not on topic - the key word here is fork. It does not apply to the MOI)

Not in the subject, that's for sure. But it's an interesting topic. Abstract.

 
Yuriy Asaulenko:

Automation there, imho, is not so easy. As far as I understand, it's hard with typical solutions - there are a lot of possible solutions.

The only thing that comes to mind is a DB by teams and players, of which there are hundreds.)

Different bookmakers one team may be called differently: Olympique, Marseille + in Latin, at least 4 variants. Different prefixes FK - soccer club, PFC - professional soccer club.

Automating forks is a pain in the ass.

 

It's a bit too simple.

In fact, the point of RL is not even in the packages, but in the approach, i.e. overshooting. Used instead of genetics, but through an approximator like NS

the main difficulty is to sample from the right distributions

 
Maxim Dmitrievsky:

It's too simple.

In fact, the point of RL is not even in the packages, but in the approach, i.e. overshooting. It is used instead of genetics, but through an approximator like NS

the main difficulty is to sample from the right distributions

Well, the simple example is kind of normal, because the example does not need to be complicated, and the fact that there are already ready packages is good.... I even did not understand that simple example (( I do not understand why matrices should be filled with probabilities and why these probabilities are needed and how they were calculated)

 
mytarmailS:

Well, the simple example is kind of normal, because the example should not be complicated, and the fact that there are already ready-made packages is good.... I don't even understand that simple example (( I don't understand why matrices should be filled with probabilities and why these probabilities are needed and how they were calculated

probabilities of state transitions, Markov chains

like probability of buying under some condition, or selling

the matrix is filled with all possible states, then the current state is selected from it and the signal is watched... it's a table primitive :)

 
Maxim Dmitrievsky:

probabilities of state transitions, Markov chains

Well, I got that....

I don't understand their role in the code

 
mytarmailS:

I got that....

I don't understand their role in the code.

What do you mean roles? It's a table of state transitions and probabilities

 
Maxim Dmitrievsky:

What do you mean, roles? It's a table of transitions from state to state and probabilities

I do not understand where the probabilities of transitions come from, we have 4 directions - left, right, up, down. The algorithm must find the way "somewhere" by the correct combination of directions. Even before the algorithm began to find the right combination of created a matrix with transition probabilities, where did they get these probabilities?

I'm probably very dumb, but still, if you don't mind me asking, why don't you explain?

 
mytarmailS:

I do not understand where the probabilities of transitions come from, we have 4 directions - left, right, up, down. The algorithm must find the way "somewhere" through the correct combination of directions. Even before the algorithm began to find the right combination of created a matrix with transition probabilities, where did they get these probabilities?

I'm probably just a dumbass, but still, if you don't mind explaining.

Read the basics, not on R, just on the internet.

Initially probabilities are randomly assigned, then during iterations they are updated by different methods, mainly TD-method, at the end they converge to an optimum, ie the solution of the initial problem, for example to get out of the apartment where there are several rooms the fastest way, without going into other rooms. For this purpose, a matrix of states (value matrix) and a matrix of transitions (policy matrix) are set, i.e. for each state (being in a certain room) there can be several transitions to other rooms and their probabilities. After each action a numerical reward (good-bad) is returned, the point of the method is to maximize reward, i.e. an agent is fined for wrong transitions, rewarded for right ones

 
Maxim Dmitrievsky:

Didn't see anything in the thread about information criteria(Bayesian or Akaike). Perhaps they are used by default (in the applied MO packages)?

Reason: