Discussing the article: "Neural networks made easy (Part 57): Stochastic Marginal Actor-Critic (SMAC)"

MetaQuotes 2024.01.18 12:20

Check out the new article: Neural networks made easy (Part 57): Stochastic Marginal Actor-Critic (SMAC).

Here I will consider the fairly new Stochastic Marginal Actor-Critic (SMAC) algorithm, which allows building latent variable policies within the framework of entropy maximization.

When building an automated trading system, we develop algorithms for sequential decision making. Reinforcement learning methods are aimed exactly at solving such problems. One of the key issues in reinforcement learning is the exploration process as the Agent learns to interact with its environment. In this context, the principle of maximum entropy is often used, which motivates the Agent to perform actions with the greatest degree of randomness. However, in practice, such algorithms train simple Agents that learn only local changes around a single action. This is due to the need to calculate the entropy of the Agent's policy and use it as part of the training goal.

At the same time, a relatively simple approach to increasing the expressiveness of an Actor's policy is to use latent variables, which provide the Agent with its own inference procedure to model stochasticity in observations, the environment and unknown rewards.

Introducing latent variables into the Agent's policy allows it to cover more diverse scenarios that are compatible with historical observations. It should be noted here that policies with latent variables do not allow a simple expression to determine their entropy. Naive entropy estimation can lead to catastrophic failures in policy optimization. Besides, high variance stochastic updates for entropy maximization do not readily distinguish between local random effects and multimodal exploration.

One of the options for solving these latent variable policies shortcomings was proposed in the article "Latent State Marginalization as a Low-cost Approach for Improving Exploration". The authors propose a simple yet effective policy optimization algorithm capable of providing more efficient and robust exploration in both fully observable and partially observable environments.

Author: Dmitriy Gizlyk

star-ik 2023.09.06 07:27 #1

Not compiled.

Files:

2023-09-06_12-25-50.png 156 kb

Nikolai Fedotov 2023.09.06 21:21 #2

I'm not compiling either. Same thing.

Dmitriy Gizlyk 2023.09.07 15:16 #3

star-ik #:
Not compiled.

The archive of files in the article has been updated.

Nikolai Fedotov 2023.09.08 14:02 #4

Dmitry, thank you for your hard work. Everything is working.

I collect examples by Expert Advisor Research for 100 passes, train the model by Expert Advisor Study, test with Test. Then I collect 50 passes again, train for 10 000 iterations, test again.

And so on until the model learns. Except that I have so far Test constantly gives different results after the cycle and not always positive. I have run a cycle, 2-3 tests and the results are different.

At what cycle will the result become stable? Or is it an endless work and the result will always be different?

Thank you!

Discussing the article: "Neural Discussing the article: "Neural Discussing the article: "Neural

Dmitriy Gizlyk 2023.09.08 14:25 #5

Nikolai Fedotov examples by Expert Advisor Research for 100 passes, train the model by Expert Advisor Study, test with Test. Then I collect 50 passes again, train for 10 000 iterations, test again.
And so on until the model learns. Except that I have so far Test constantly gives different results after the cycle and not always positive. That is, I run a cycle, 2-3 tests and the results are different.

At what cycle the result will become stable? Or is it an endless work and the result will always be different?

Thank you!

The Expert Advisor trains a model with a stochastic policy. This means that the model learns probabilities of maximising rewards for taking particular actions in particular states of the system. As it interacts with the environment, the actions are sampled with the learnt probabilities. In the initial stage, the probabilities of all actions are the same and the model selects an action randomly. In the learning process, the probabilities will shift and the choice of actions will be more conscious.

Discussing the article: "Neural Discussion of article "Neural Is there a pattern

Viktor Kudriavtsev 2023.09.08 19:49 #6

Dmitry hello. How many cycles did it take you as Nikolay described above to get a stable positive result?

And another interesting thing is that if an Expert Advisor learns for the current period and if for example in a month it will need to be retrained taking into account new data, it will be retrained completely or before learning? Will the training process be comparable to the initial one or much shorter and faster? And also if we have trained a model on EURUSD, then for work on GBPUSD it will be retrained as much as the initial one or it will be faster just before training? This question is not about this particular article of yours, but about all your Expert Advisors working on the principle of reinforcement learning.

Machine learning in trading: What to feed to Discussion of article "CatBoost

Oleg_Filatov 2023.09.13 10:02 #7

Good day.

Dimitri, thank you for your work.

I want to clarify for everyone...

What Dimitri is posting is not a "Grail".

It is a classic example of an academic problem, which implies preparation for scientific research activities of theoretical and methodological nature.

And everyone wants to see a positive result on their account, right here and now....

Dmitry teaches us how to solve (our/my/your/their) problem by all methods presented by Dmitry.

Popular AI (GPT) has over 700 Million parameters!!!! How much is this AI?

If you want to get a good result, exchange ideas (add parameters), give test results, etc.

Create a separate chat room and "get" the result there. You can brag here :-), thus showing the effectiveness of Dmitry's work...

SELLING THE HOLY GRAIL. Calculation of the slope How many hours a

Viktor Kudriavtsev 2023.09.14 16:16 #8

Oleg_Filatov test results, etc.
Create a separate chat room and "get" the result there. You can brag here :-), thus showing the effectiveness of Dmitry's work...

Mate, nobody is waiting for the grail here! I would just like to see that what Dmitriy puts out actually works. Not from Dmitry's words in his articles (all his articles have almost positive results), but on my computer. I downloaded his Expert Advisor from this article and have already done 63 cycles of training (data collection -> training). And it is still losing money. For all 63 cycles there were only a couple of data collections, when out of 50 new examples there were 5-6 positive ones. Everything else is minus. How can I see that it really works?

I asked Dmitriy in the above post, he didn't answer anything. The same problem in other articles - no result no matter how much you train.....

Friend, if you got a stable result, then write how many cycles you did before stable result, for example in this article? If to change, what to change to see the result on your computer, just in the tester? Not a grail, but at least to see that it works...?

Discussing the article: "Neural Discussing the article: "Neural Machine learning in trading:

JimReaper 2023.09.22 06:11 #9

Oleg_Filatov test results, etc.
Create a separate CHAT and "get" the result there. You can brag here :-), thereby showing the effectiveness of Dmitry's work ...

Enjoy <3

Here are the Params: (based on Dmitry and some research.)

// Input parameters for RSI

input group "---- RSI ----"

input int RSIPeriod = 14; // Period

input ENUM_APPLIED_PRICE RSIPrice = PRICE_CLOSE; // Applied price

// Input parameters for CCI

input group "---- CCI ----"

input int CCIPeriod = 14; // Period

input ENUM_APPLIED_PRICE CCIPrice = PRICE_TYPICAL; // Applied price

// Input parameters for ATR

input group "---- ATR ----"

input int ATRPeriod = 14; // Period

// Input parameters for MACD

input group "---- MACD ----"

input int FastPeriod = 12; // Fast

input int SlowPeriod = 26; // Slow

input int SignalPeriod = 9; // Signal

input ENUM_APPLIED_PRICE MACDPrice = PRICE_CLOSE; // Applied price

// Input parameters for Momentum

input group "---- Momentum ----"

input int MomentumPeriod = 14; // Period for Momentum

input ENUM_APPLIED_PRICE AppliedPrice = PRICE_CLOSE; // Applied price for Momentum

// Input parameters for SAR

input group "---- SAR ----"

input float SARStep = 0.02f; // SAR Step

input float SARMaximum = 0.2f; // SAR Maximum

// Input parameters for Bands

input group "---- Bands ----"

input int BandsPeriod = 20; // Period for Bands

input int BandsDeviation = 2.0; // Bands Deviation

input int BandsShift = 0; // Bands Shift

#include "FQF.mqh"

//---

#define HistoryBars 72 //Depth of history

#define BarDescr 14 //Elements for 1 bar description

#define AccountDescr 12 //Account description

#define NActions 6 //Number of possible Actions

#define NRewards 5 //Number of rewards

#define EmbeddingSize 64

#define Buffer_Size 6500

#define DiscFactor 0.99f

#define FileName "zJimReaper_NNM_Neural_Network_"

#define LatentLayer 11

#define LatentCount 2048

#define SamplLatentStates 32

#define MaxSL 1000

#define MaxTP 1000

#define MaxReplayBuffer 500

#define StartTargetIteration 50000

#define fCAGrad_C 0.5f

#define iCAGrad_Iters 15

#define KNN 32

//+------------------------------------------------------------------+

//| |

//+------------------------------------------------------------------+

//| |

//+------------------------------------------------------------------+

bool CreateDescriptions(CArrayObj *actor, CArrayObj *critic, CArrayObj *convolution)

{

//---

CLayerDescription *descr;

//---

if(!actor)

{

actor = new CArrayObj();

if(!actor)

return false;

}

if(!critic)

{

critic = new CArrayObj();

if(!critic)

return false;

}

if(!convolution)

{

convolution = new CArrayObj();

if(!convolution)

return false;

}

//--- Actor

actor.Clear();

//--- Input layer

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

int prev_count = descr.count = (HistoryBars * BarDescr);

descr.activation = None;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 1

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBatchNormOCL;

descr.count = prev_count;

descr.batch = 1000;

descr.activation = None;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 2

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronConvOCL;

prev_count = descr.count = BarDescr;

descr.window = HistoryBars;

descr.step = HistoryBars;

int prev_wout = descr.window_out = HistoryBars / 2;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 3

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronConvOCL;

prev_count = descr.count = prev_count - 1;

descr.window = 7;

descr.step = 3;

descr.window_out = 32;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 4

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronConvOCL;

prev_count = descr.count = prev_count - 1;

descr.window = 5;

descr.step = 2;

descr.window_out = 16;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

The length of the message should not exceed 64000 characters

Discussing the article: "Neural Discussing the article: "Neural Discussing the article: "Neural

JimReaper 2023.09.22 06:14 #10

//--- layer 5

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronConvOCL;

prev_count = descr.count = prev_count - 1;

descr.window = 3;

descr.step = 1;

descr.window_out = 8;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 6

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronConvOCL;

prev_count = descr.count = BarDescr;

descr.window = HistoryBars;

descr.step = HistoryBars;

prev_wout = descr.window_out = HistoryBars / 2;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 7

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronConvOCL;

prev_count = descr.count = prev_count;

descr.window = prev_wout;

descr.step = prev_wout;

descr.window_out = 32;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 8

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = 2 * LatentCount;

descr.optimisation = ADAM;

descr.activation = LReLU;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 9

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

prev_count = descr.count = LatentCount;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 10

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronConcatenate;

descr.count = 4 * LatentCount;

descr.window = prev_count;

descr.step = AccountDescr;

descr.optimisation = ADAM;

descr.activation = SIGMOID;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 11

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronVAEOCL;

descr.count = 2 * LatentCount;

descr.optimise = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 12

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = 2 * LatentCount;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 13

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = LatentCount;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 14

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = LatentCount;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 15

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = 2 * NActions;

descr.activation = SIGMOID;

descr.optimisation = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- layer 16

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronVAEOCL;

descr.count = NActions;

descr.optimise = ADAM;

if(!actor.Add(descr))

{

delete descr;

return false;

}

//--- Critic

critic.Clear();

//--- Input layer

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

prev_count = descr.count = 2 * LatentCount;

descr.activation = None;

descr.optimisation = ADAM;

if(!critic.Add(descr))

{

delete descr;

return false;

}

//--- layer 1

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronConcatenate;

descr.count = 2 * LatentCount;

descr.window = prev_count;

descr.step = NActions;

descr.optimisation = ADAM;

descr.activation = LReLU;

if(!critic.Add(descr))

{

delete descr;

return false;

}

//--- layer 2

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = 2 * LatentCount;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!critic.Add(descr))

{

delete descr;

return false;

}

//--- layer 3

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = LatentCount;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!critic.Add(descr))

{

delete descr;

return false;

}

//--- layer 4

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = LatentCount;

descr.activation = LReLU;

descr.optimisation = ADAM;

if(!critic.Add(descr))

{

delete descr;

return false;

}

//--- layer 5

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = NRewards;

descr.optimisation = ADAM;

descr.activation = None;

if(!critic.Add(descr))

{

delete descr;

return false;

}

//--- Convolution

// Define common parameters

int input_size = (HistoryBars * BarDescr) + AccountDescr;

int num_actions = NActions;

int embedding_size = EmbeddingSize;

// Create a neural network

convolution.Clear();

// Input layer 0

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = 4 * input_size;

descr.activation = None;

descr.optimisation = ADAM;

if (!convolution.Add(descr))

{

delete descr;

return false;

}

// Layer 1

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = 2 * LatentCount;

descr.window = 2 * input_size;

descr.step = 2 * num_actions;

descr.activation = SIGMOID;

descr.optimisation = ADAM;

if (!convolution.Add(descr))

{

delete descr;

return false;

}

// Layer 2

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = LatentCount;

descr.window = input_size;

descr.step = num_actions;

descr.activation = SIGMOID;

descr.optimisation = ADAM;

if (!convolution.Add(descr))

{

delete descr;

return false;

}

// Convolutional layers

for (int i = 0; i < 6; i++)

{

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronConvOCL;

descr.count = 2 * LatentCount / (1 << i); // Halve the count with each layer

descr.window = 64;

descr.step = 64;

descr.window_out = 32 / (1 << i); // Halve the window_out

descr.activation = LReLU;

descr.optimisation = ADAM;

if (!convolution.Add(descr))

{

delete descr;

return false;

}

// Output layer

if (!(descr = new CLayerDescription())) return false;

descr.type = defNeuronBaseOCL;

descr.count = embedding_size;

descr.activation = LReLU;

descr.optimisation = ADAM;

if (!convolution.Add(descr))

{

delete descr;

return false;

}

// Successfully created the network

return true;

}

#ifndef Study

//+------------------------------------------------------------------+

//| |

//+------------------------------------------------------------------+

bool IsNewBar(void)

{
===I CUT THE LAST PARTS as Comments are limited to 64000 Chars but you know what to do... =)
The length of the message should not exceed 64000 characters

-----------------------------------------------------+

Files:

Screenshot_2023-09-18_133537.png 100 kb

Screenshot_2023-09-12_223734.png 280 kb

Discussing the article: "Neural Discussing the article: "Neural Discussing the article: "Neural

1 2

New comment