MetaTrader 5 Strategy Tester and MQL5 Cloud Network - page 30

 
Renat:

I'm afraid that with 24 agents on 8 cores (4 essentially + hypertrading) you will spend all the CPU performance on providing infrastructure.

Exposing an excessive number of agents causes their PR performance index to drop drastically, resulting in a multiple of the fee.

Got it all set up with 8 EAs. Thanks for the information!
 
papaklass:

Haven't used the cloud in a while. Decided to use it in the selection of parameters. The work of the cloud was pleasantly surprising.

If you grind the distributed network system for a long time, the result is good.
 
Renat:
If you grind a distributed network system for a long time, you get a good result.
PF      0       MQL5 Cloud Europe 2     00:24:16        genetic pass (264, 0, 188) started
JL      0       MQL5 Cloud Europe 2     00:29:07        connection closed
ID      0       MQL5 Cloud Europe 2     00:29:07        connecting to 3.agents.mql5.com:443
GL      0       Tester  00:29:07        cloud server MQL5 Cloud Europe selected for genetic computation
KO      0       MQL5 Cloud Europe 2     00:29:07        connected
JP      0       MQL5 Cloud Europe 2     00:29:10        authorized (server build 696)
RG      0       Tester  00:30:11        Best result 32.12073652718463 produced at generation 20. Next generation 21
KJ      0       MQL5 Cloud Europe       00:30:11        common synchronization completed
GN      0       MQL5 Cloud Europe       01:57:24        connection closed
CI      0       MQL5 Cloud Europe 2     01:57:24        connection closed
MS      3       Tester  01:57:24        genetic pass (21, 285) not processed and added to task queue
II      3       Tester  01:57:24        genetic pass (21, 498) not processed and added to task queue
PO      3       Tester  01:57:24        genetic pass (21, 499) not processed and added to task queue
GQ      0       MQL5 Cloud Europe       01:57:24        genetic pass (21, 285) returned to queue
NF      0       Tester  01:57:24        genetic pass (21, 499) already processed
KN      0       Tester  01:57:24        genetic pass (21, 498) already processed
OJ      0       Core 1  01:57:24        genetic pass (285, 0, 1) started
PS      0       Core 2  01:57:24        genetic pass (285, 0, 1) started
Tasks were issued to local + remote agents + the cloud. Hanging one pass on the cloud. After almost an hour and a half of waiting, disconnected the cloud - the tasks were transferred to the local agents. The speed of the run is within 1-3 minutes:
DP      0       Core 1  02:14:59        genetic pass (23, 256) returned result 4.45 in 45 sec
LH      0       Core 1  02:14:59        genetic pass (273, 0, 1) started
CP      0       Core 5  02:14:59        genetic pass (23, 260) returned result 2.64 in 46 sec
OH      0       Core 5  02:14:59        genetic pass (274, 0, 1) started
PS      0       Core 6  02:15:01        genetic pass (23, 261) returned result 3.37 in 48 sec
HH      0       Core 6  02:15:01        genetic pass (278, 0, 1) started
KQ      0       Core 8  02:15:03        genetic pass (23, 264) returned result -0.01 in 50 sec
CG      0       Core 8  02:15:03        genetic pass (279, 0, 1) started
PP      0       Core 2  02:15:06        genetic pass (23, 257) returned result -0.01 in 52 sec
DG      0       Core 2  02:15:06        genetic pass (280, 0, 1) started
NP      0       Core 3  02:15:07        genetic pass (23, 258) returned result -0.01 in 53 sec

All in all, there's no way it's an hour and a half.

P. S. Turned the cloud on the fly. Due to loss of internet, remote agents dropped out. Then they didn't want to connect (autorized state; at least two genetic generations didn't connect) - apparently, the tester decided that there's enough to do on the cloud, and let the free agents rest. Disconnected the cloud. The remote agents are connected. Turned the cloud back on. Ended up with a hang-up.

 

Network needs to be a little bit finalized to avoid such situations (for example, to remember maximal pass time and if waiting for pass lasts 2 times longer than maximal pass time - to start the same process on the best core from local (or remote)).

+ TerminalInfoInteger(TERMINAL_MEMORY_AVAILABLE) needs to be refined

+ speed of genetics depends on speed of weakest core - if my cores have PR - 160-180, and tasks in cloud are distributed to cores up to 100. As a result, every generation, my cores have to idle for a significant amount of time and wait for responses from the cloud to generate a new population. I think I should drop the 100PR limit and give first priority to those agents which PR is greater than the PR of the weakest local core (+or remote core, if connected). If not, load balancing must be done somehow. For example, if we assume that all passes run on the same core at the same speed (in life, of course, not so, but many experts with some assumptions, can be called stable in testing time, regardless of parameters). If PR of local core is 150 and PR of core in cloud is 100, then local agent should be given 1.5 times more tasks than agent in cloud. Alternatively, if PR is lower, don't give agents in the cloud one task at a time, but give one task to a wider range of agents. In this case, downtime would be minimal. In general, I would like to see progress on this issue

 

In the last 12 hours, the network has hung up three more times :(

(And there are agents with PR < 100 in genetics journals)

 
By the way, has anyone tried sharing agents on an ssd? Considering the way my hard drive starts crunching on 8 agents, even without tasks, I have a suspicion that ssd resource is quickly running out. And when testing a fairly light EA, compute speed, the speed of the hard drive starts to get bogged down. How many terabytes are pumped into the cache is a good question)
 
sion:
By the way, has anyone tried to share agents on an ssd? Considering how my drive starts crunching on 8 agents, even without tasks, I have a suspicion that the ssd resource is quickly running out. And when testing a fairly light EA, compute speed, the speed of the hard drive starts to get bogged down. How many terabytes are pumped into the cache is a good question)

There is such a letter in the alphabet (I mean ssd), but I haven't done specific tests: as the server with such a device is at the other end of the city. But, IMHO, in any system there is a disk cache, which smoothes frequent access to the disk.

 
I wonder who gave so many resources for this cloud, system wear and tear with electricity consumption, clearly more than 2-3 cents per day. Several times I tried to provide resources, but it's a lost cause if less than 10Gb on disk (at 9GB RAM), with some load genetics, the system just stupid hangs, even if not eat all the free space (RAM, etc., up to swap), one fey the drive tries to pump the full cache, which leads to savage brakes.
 
I don't write a question and it immediately disappears.
Files:
Picture_61.png  585 kb
 

I decided to optimise a simple grider (timer 30 sec, new m1 bar control) on all ticks for two pairs. My 4 cores i5 (PR=160-170) and 8 cores i7 (PR=170-180) optimized for about 90 (!) hours.

Then it turned out that i5 passages are tested 2 times slower (although, as I've already written several times before, it was vice versa - i5 + winxp64 was faster than i7 + win7x64). At first I mowed down the memory - i7 has more of it.

Then I accidentally glanced at task manager and saw that the agents have the lowest priority (Low). And I've got it on both machines. And while I managed to raise the priority to Normal on win7, winxp64bit doesn't allow it for some reason. During half a day with new priorities on i7, my testing time was reduced (seemingly :)) by several hours.

Such "lags" seem to be observed in last two builds (or maybe it only seems to me).

And Priority Low is too cruel - if equipment at least 12 hours per day can give maximum priority to agents.

In general, I thought that the priority somehow automatically changes depending on resource load, but it seems that it does not change by itself :(

Reason: