The user is struggling to implement reward updates based on profits and losses using Q-learning and matrix changes.

FxTrader562 2018.08.08 23:36 #71

rogivilela:

Hello people ,

First of all I would like to congratulate Maxim Dmitrievsky for his article.

Secondly, I want to say that I am keeping an eye on the topic, because the subject is very interesting.

Thirdly I would like to take a doubt because I am not able to understand how the execution of the reward in the EA of classification is made today, could anyone describe it?

What I understood is that when the EA closes the position with a negative value it makes a change in the 2 indexes of the vector (3 and 4).

Is anyone eligible? How do I know this reward is good? because I would like to increase the rewards when the operation is positive and take a certain amount of points.

Thank you.

Ps. I used Google Translate, sorry if it is not understandable.

This is exactly what I have been working almost from the beginning of the article as how to integrate account parameters to the policy and how to update the reward based on profits and losses. But till now I am not able to successfully implement anything.

I noticed that if we will try to implement profits and losses to the reward function, then we have to entirely change this updateReward() function. Also, we may need to change the matrix implementation completely.

I have a solution to implement profits and losses using Q-learning by BellMan's equation in which we can implement floating profits and losses to the agent to update the reward. But we need to create a new matrix and update the whole matrix on every candle. But I am not good at matrix implementation and hence, I am just waiting for the author to publish his next article with new agents.

If anyone is interested in Q learning implementation and can implement the matrix, then I can discuss here as how to update the reward using profits and losses using Q value.

I have been testing the EA with unlimited number of combination of indicators and settings, but I figured out that there is no other way to improve the results without updating the policy. The agent is doing exactly what it has been assigned to do and hence, it is just closing small small profits to increase the win % , but overall the account doesn't grow profit since the policy doesn't consider small or big losses separately.

Machine learning in trading: Discussion of article "Applying Google's Deep Mind "Alpha

FxTrader562 2018.08.13 23:51 #72

Hi Maxim Dmitrievsky,

Is there any progress or update towards publishing your next article regarding RDF ?

Thank you...

Igor Vilela 2018.11.05 18:06 #73

After updating to build 1940, it is no longer working, the return of the calculation receives the value of "-nan (ind)". Anybody know what happened?

[Deleted] 2018.11.05 18:09 #74

Igor Vilela:
After updating to build 1940, it is no longer working, the return of the calculation receives the value of "-nan (ind)". Anybody know what happened?

Hi, try this library https://www.mql5.com/en/code/22915

or try to recompile

RL GMDH

www.mql5.com

Данная библиотека имеет расширенный функционал, позволяющий создавать неограниченное количесто "Агентов". Использование библиотеки: Пример заполнения входных значений нормированными ценами закрытия: Обучение происходит в тестере...

Igor Vilela 2018.11.05 19:53 #75

thank you Maxim Dmitrievsky, but I have already made all the knowledge on this case, I would like to try to correct this mistake, since I am already running a robbery with the idea that was presented in this article. Could help identify what caused the error. It stopped working when upgrading to version 1940.

Indicator's value is different [WARNING CLOSED!] Any newbie Close half script -

[Deleted] 2018.11.05 20:20 #76

Igor Vilela:
thank you Maxim Dmitrievsky, but I have already made all the knowledge on this case, I would like to try to correct this mistake, since I am already running a robbery with the idea that was presented in this article. Could help identify what caused the error. It stopped working when upgrading to version 1940.

Try to download correct fuzzy library from here, because maybe MT5 update can change it to default

https://www.mql5.com/ru/forum/63355#comment_5729505

Библиотеки: FuzzyNet - библиотека для работы с нечеткой логикой

2015.08.26
www.mql5.com

8 новых функций принадлежности.

Igor Vilela 2018.11.05 21:31 #77

I managed to solve it, thank you Maxim Dmitrievsky.

I again copied the entire MATH folder to the new metatrader and restarted the computer.

[Deleted] 2019.03.01 04:11 #78

FxTrader562:

Dear Maxim Dmitrievsky,

Can you please update if you have published your next article regarding implementation of Random Decision forest with different agents and without fuzzy logic which you mentioned previously?

Thank you very much

Hi, FxTrader, the new article has been translated and now available

I'm not sure about translation quality, not my job, but i guess all good.

Evgeniy Scherbina 2019.05.29 19:16 #79

Good afternoon. I don't understand why we need to add anything to Metatrader in terms of neural network training?

There are weights, they need to be optimised using the optimisation mechanism in Metatrader. Didn't you think that Metatrader developers have already made good progress in the issue of training networks / optimising parameters?

Buy and Sell are performed according to the rules defined by indicators. The neural network aggregates "observation data" of these indicators (number of peaks, height of peaks on the eve of a trade, etc.), but not the indicator values themselves, because this is nonsense. You can check the configuration of the weights right in the course of training, for example: +1 if the market went where it should go for the next 2 days, and -1 if it went the wrong way. At the end each configuration of weights has a sum. This is how we optimise the best configuration of weights according to the user's criterion (this is such an optimisation parameter, everything has to be thought out?!).

The described example takes 40-50 lines in the code. That's the whole neural network with training. And I come back to my original question: why do you think that by inventing something complex and poorly understood, you have become closer to the holy grail? The more complex and incomprehensible the black box I create, the more I am flattered by it, like, how clever I am!

Machine learning in trading: Experts: Examples from the What to feed to

[Deleted] 2019.05.29 19:38 #80

Evgeniy Scherbina:

Good afternoon. One thing I don't understand is why we need to add anything to Metatrader in terms of neural network training?

There are weights, they need to be optimised using the optimisation mechanism in Metatrader. Didn't you think that Metatrader developers have already made good progress in training networks / optimising parameters?

Buy and Sell are performed according to the rules defined by indicators. The neural network aggregates "observation data" of these indicators (number of peaks, height of peaks on the eve of a trade, etc.), but not the indicator values themselves, because this is nonsense. You can check the configuration of the weights right in the course of training, for example: +1 if the market went where it should go for the next 2 days, and -1 if it went the wrong way. At the end each configuration of weights has a sum. This is how we optimise the best configuration of weights according to the user's criterion (this is such an optimisation parameter, everything has to be thought out?!).

The described example takes 40-50 lines in the code. That's the whole neural network with training. And I come back to my original question: why do you think that by inventing something complex and poorly understood, you have become closer to the holy grail? The more complicated and incomprehensible the black box I created, the more I am flattered by it, like, how clever I am!

When you grow up, you'll understand.

At least, for starters, read what solvers are used in neural networks, and why nobody uses genetics to train them.

Discussion of article "Random Decision Forest in Reinforcement learning" - page 8