Reinforcement learning is a new way to train the agent - General

Maxim Dmitrievsky 2022.11.02 16:33 #28111

mytarmailS #:

I am not talking from the position of labels, but from the position of, for example, some very complex multistep policies of the agent's behaviour

You are confused by tabular RL, where in tables the directions of transitions from one state to another are optimised, these are policies. These tables were later replaced by neural networks. This is when there are many states of the agent, for example in games. You have only 2-3 buy/sell states, etc. Then you optimise transitions to these states through a reward function, for example by sampling trades with some profitability condition, and policies are optimised through NS. A policy is the relationship between the environment and the state to which you want to transition. For example, the relationship of indicator values to the direction of trades.

It makes no sense to make a multi-pass of 100500 transitions through the table when the NS has already approximated everything and shown you the error of your actions. This is necessary if you need to jump first, then shoot, reload, collect loot, run to the other side, etc. That is to perform a lot of actions, and you have only 2-3. Although the boss is the boss :)

I forgot to add that there the agent also affects the environment, changes it, and the environment on the agent. That's why you need to learn thousands of repetitions to go through all the combinations. In our case it is not so, the environment does not change, so we can do it in 1 time. In such a setting, reinforcement learning loses the meaning of reinforcement learning altogether. You can find the shortest path to the goal in 1 pass.

Questions from Beginners MQL5 Machine Learning and Neural Quantitative trading

peregrinus_vik 2022.11.03 06:13 #28112

The solution scheme is simple, preprocessing of the dataset is done. Its standard analysis is done. From the rest, each input is taken as a target variable, and output at each ML is taken as a fic. The prediction of each input is evaluated, bad "predictable" ones are thrown out. Well, those Inputs that pass the filter are included in the working model by prediction of output in ML. I would throw out chips that do not affect prediction in this way.

Non-fitting system - main Market etiquette or good Teach me how to

mytarmailS 2022.11.03 09:15 #28113

Maxim Dmitrievsky #:
This is when there are many agent states, for example in games. You only have 2-3 buy/sell states etc.

No, it's so primitive, otherwise it wouldn't be this direction at all.

state is not a buy/sell, buy/sell is an action, and state is roughly speaking the cluster number of the current environment, and each state cluster has its own action...

But theaction doesn't have to be primitive like buy/sell, it can be the agent's reasoning about the future for example ....

Like what if I buy now at [i], and on trade candle[i+1] the price will fall, but not below some price, I will wait for the next candle [i+2], but if the price goes even lower I will reverse, if not I will keep buy[i...20].

These are non-trivial reasoning about the future and lead to the discovery of conscious poziya....

But there are a myriad of such combinations of reasoning options, so that we don't have to go through them all, we train a Q function, that is, the agent takes for reasoning only those options that have a good Q value ,

Q neuron or matrix is trained beforehand...

This is how I see it...

[ARCHIVE!] Any rookie question, Petit Pip Eater Why does the price

mytarmailS 2022.11.03 09:16 #28114

peregrinus_vik #:
The solution scheme is simple.

)))) yeah, sure...

I'm afraid of those who say "it's simple."

Valeriy Yastremskiy 2022.11.03 09:20 #28115

mytarmailS #:

No, it's so primitive, otherwise it wouldn't be this direction at all...

state is not a by\sel, by\sel is an action, and state is roughly speaking the cluster number of the current environment, and each state cluster has its own action...

But theaction doesn't have to be primitive like a byte, it can be an agent's thoughts about the future for example ....

Like what if I buy at [i] now, and on trade candle[i+1] the price will fall, but not below some price, I will wait for the next candle [i+2], but if the price goes even lower I will reverse, if not I will keep buy[i...20].

These are non-trivial reasoning about the future and lead to the discovery of realised position....

But there are a myriad of such combinations of reasoning options, so that we do not have to go through them all, we train the Q function, that is, the agent takes for reasoning only those options that have a good Q value ,

Q neuron or matrix is trained beforehand...

This is how I see it...

I agree, buy sell no trade is not a state. There are a hulliard of states.))))))

mytarmailS 2022.11.03 09:37 #28116

Valeriy Yastremskiy #:

I agree, buy sell not trade is not states. There are a hulliard of states.)))))

There aren't many states (if it's a cluster).

There are a horde of options for reasoning about future actions.

but reasoning is necessary to find the most correct actions in each state, moreover, they should be reviewed at each candle.

MT4 bars/candles open/close do FOREX - Trends, forecasts From theory to practice

Maxim Dmitrievsky 2022.11.03 09:53 #28117

mytarmailS #:

No, it's so primitive, otherwise it wouldn't be this direction at all...

state is not a by\sel, by\sel is an action, and state is roughly speaking the cluster number of the current environment, and each state cluster has its own action...

But theaction doesn't have to be primitive like a byte, it can be an agent's thoughts about the future for example ....

Like what if I buy at [i] now, and on trade candle[i+1] the price will fall, but not below some price, I will wait for the next candle [i+2], but if the price goes even lower I will reverse, if not I will keep buy[i...20].

These are non-trivial reasoning about the future and lead to the discovery of realised position....

But there are a myriad of such combinations of reasoning options, so that we do not have to go through them all, we train the Q function, that is, the agent takes for reasoning only those options that have a good Q value ,

Q neuron or matrix is trained beforehand...

This is how I see it...

When you start seeing it right, the wow factor goes away.

You're describing an agent's policy, a multi-pronged approach. I've written all about it. I'm writing in nerd language to make sense, and I've forgotten.

Exactly, it's so primitive.

There's one here who was foaming at the mouth about agents before she was banned.)

How I assemble my Interesting idea .... A question for OOP

Maxim Dmitrievsky 2022.11.03 10:04 #28118

Valeriy Yastremskiy #:

I agree, buy sell not trade is not states. There are a hulliard of states.))))

Agent states, or action states. I suggest that you spend a couple of months reading books to understand what you have written about and come to the same conclusions ) without the reaction of the environment to the agent's actions there is nothing to optimise, it is done in one pass.

There are environment states, agent states, matrices of transitions (policies) of the agent from state to state, taking into account changes in the environment. Your environment is static, it does not change due to the agent's actions. That is, you only need to define the matrix of agent's actions in a static environment, i.e. targets. The marking of targets is done in one pass.

From theory to practice Idle work Rent a Signal service

mytarmailS 2022.11.03 11:38 #28119

Maxim Dmitrievsky #:
Agent states, or actions. I suggest you spend a couple of months reading books to understand what you have written about, and come to the same conclusions ) without the reaction of the environment to the agent's actions, there is nothing to optimise, it is done in one pass.

There are environment states, agent states, matrices of transitions (policies) of the agent from state to state, taking into account changes in the environment. Your environment is static, it does not change due to the agent's actions. That is, you only need to define the matrix of agent's actions in a static environment, i.e. targets. The marking of targets is done in one pass.

If I want to penalise the agent for unprofitable trades--

The target is "trade what you want, but no losing trades, and be in the market."

How do you describe that with markups?

Maxim Dmitrievsky 2022.11.03 11:44 #28120

mytarmailS #:

If I want to penalise an agent for unprofitable deals.

The goal is "trade what you want, but no losing trades, and be in the market."

How would you describe that with tags?

A series of lossless marks on the history, no? ) Tag them

Rl is about finding the optimal path, if you will, or optimisation. You can do it yourself or through him. It's not about finding some super-duper patterns out there.

Read Sutton, Barto, "reinforcement learning", it's in Russian. It goes from primitives to everything else. Then you'll get to DQN

There you will find analogies with genetic optimisation and programming, as far as I remember.

Non-fitting system - main Any rookie question, so Your symbols and your

Machine learning in trading: theory, models, practice and algo-trading - page 2812