Need help! Can't solve the problem, I'm hitting hardware limitations - page 17

 
it is possible to calculate everything for one data read - the main thing is desire
komposter:

Yes, in this form the task is parallel - each time the SeekDate changes, you can run a simultaneous search for the best Criterion on different parts of the sequence set. For example, we divide them into 20 parts and give the task to 20 Expert Advisors. And they should read the file, find the deal, and send back only the best sequence (№№, Criterion and file position).

Thank you all very much!

If we run this algorithm on multiple EAs, we are limited by the speed of reading from the disk + in parts + from different positions, and the speed is likely to be insufficient.

 
ALXIMIKS:
You may calculate everything in one data read - if you want to, the main thing is the desire to do it.

Reading from disk is quite an expensive operation - running this algorithm on several Expert Advisors will be limited by the speed of reading from the disk + in parts + from different positions and speed is unlikely to be achieved.

Who said that distributed computing must be performed on a single workstation? There are no such restrictions)) RAM-disk will help, as it was said above.
 
elugovoy:
Who said thatthe distributed calculations should be done on a single workstation? There are no such restrictions)) RAM-disk is also a great help, as it was said above.

How about thinking a little and changing the algorithm?

1. how to download several sequences in one go and how to know where they start and where they end

2. How to calculate the coefficients for all deals in one sequence and in what data structure to store the answer

3. how to make a merge of two answers from a previous point into a new slightly more complete answer

4. How to divide the final answer from point 3 into the required intervals (we are talking aboutthe SeekingDate = Trade closing time + 1)

We may obtain a slightly different algorithm whereby selecting into how many additional parts to divide theSeekingDate interval

we can get different errors as compared to the initial author's algorithm.

 
papaklass:

The digit capacity of a number is determined by the uniqueness of the entry code, not the digit capacity of the system. Clearly, it should be a multiple of 1 byte.

64 bits seems to me to be too much. :)

If there are 1,000,000 records, 16 bits would not be enough for a unique record code (maximum 65536 records). That's one.

Look at Intel (Itanium), AMD (I didn't say operating system) 32-bit and 64-bit processors architecture. 32/64 is the address bus resolution, but at the same time 32/64 bits (4/8 bytes) are read from memory in one machine cycle, even when accessing one byte of memory.

Therefore, it makes absolutely no difference in terms of performance whether it reads 2 bytes or 8 bytes from memory.

In principle, you could write a Windows Service for this kind of file handling.

But I'm still inclined to use DBMS.

 
ALXIMIKS:

How about thinking a little and changing the algorithm?

1. how to download several sequences in one go and how to know where they start and where they end

2. How to calculate the coefficients for all deals in one sequence and in what data structure to store the answer

3. how to make a merge of two answers from the previous item into a new slightly more complete answer

4. How to divide the final answer from point 3 into the required intervals (we are talking aboutthe SeekDate = Trade Close Time + 1)

We may obtain a slightly different algorithm whereby selecting into how many additional parts to divide theSeekingDate interval

we may get different errors as compared to the initial author's algorithm.

For all 4 points, data processing on DBMS side.

About the "slightly different" algorithm, it's not quite clear what you mean. But. To somehow calculate the error of this "slightly different" algorithm compared to the "author's" one, you need both algorithms to be implemented. And the thread was created precisely because of the technical problem of implementing the "author's" algorithm.

Given this fact, what methodology are you going to use to calculate the error you are talking about?

 
ALXIMIKS:
you can calculate everything in one read - the main thing is the desire.

Reading from disk is quite an expensive operation - running this algorithm on several Expert Advisors will be limited by the speed of reading from disk + in parts + from different positions and it is unlikely that this idea will give any speed.

Let's say HDD is the slowest device, it's a fact. However, we are not talking about running multiple EAs using all these calculations. As I see it, the most likely application is signal generation. Say cloud server on amazon with necessary performance + MT4 + this development = signal provider.

 
elugovoy:

For all 4 points, the data processing is on the DBMS side.

About the "slightly different" algorithm, I'm not sure what you mean. But. To somehow calculate the error of this "slightly different" algorithm in comparison with the "author's" algorithm, both algorithms must be implemented. And the thread was created precisely because of the technical problem of implementing the "author's" algorithm.

Given this fact, what methodology are you going to use to calculate the error you are talking about?

I understand from the author that range will be clipped by maximal coefficient chosen from the given range. My suggested variant is to divide each range into N subranges, where at convergence only one value of coefficient may fit. So at N = 5 the range can be divided into proportions of 0.2 0.4 0.6 0.8 1. And any value from 0 to 1 is cut off in the author's range. So a range error of 0.2 is the maximum at N = 5.

And all around how correctly the author's posts were interpreted, because there is still no complete clarity.

 
ALXIMIKS:

As far as I understood the author's version (in case something is wrong again, as there is no clear and complete explanation of what exactly is needed), the range will be cut off by the maximum coefficient selected from this range; in the version I suggested, each such range should be divided into N subranges where only one coefficient value can fit in the merger. So at N = 5 the range can be divided into proportions of 0.2 0.4 0.6 0.8 1. And any value from 0 to 1 is cut off in the author's range. So a range error of 0.2 is the maximum at N = 5.

And all around how correctly the author's posts were interpreted, because there is still no complete clarity.

Yes, apparently the project was ordered by the Ministry of Finance, there is not enough specific information. However, everyone can find something interesting for themselves from the discussion. I see this as a positive aspect of the discussion.

About the discretization of ranges, your idea is clear. And if N=100, 1000... (purely mathematically it is possible) then this splitting will cause a backlash in terms of performance and system resource usage. There is also physics besides mathematics )

 
Candid:

We have a fragment of a file in our memory, go through it and form a sample of the necessary length for calculating the criterion, selecting only deals that belong to the same sequence. Then we calculate the criterion on this sample. By the way, there are possibilities to use recursion in selection.

So you would need to go through several million deals from other sequences! That's exactly what I mean.

ALXIMIKS:

The problem with inserting new data - solve it somehow.

No problem so far, the data is a fixed amount.

TheXpert:
By the way, if you know the starting place of each sequence, you can search for the right dates with a binary search, since the trades are ordered by time.

+1, thanks for the idea.

elugovoy:

1. based on the above"Let the criterion be the average profit of the last 20 trades of the sequence.", this should be understood as one criterion, the moving expectation of profit. What others are there?

In the database, generate a table with sequence identifier and corresponding moving averages. The sequences that don't fit the conditions should be deleted immediately. This should be done by a concurrent mode procedure at DBMS level, on request from the robot, with process status displayed in the robot.

Let us say, FilterAvgProfit (pProfitValue, pTrades, pDeviation),

where pProfitValue is the target profit, pTrades is the number of trades for the moving average profit, pDeviation is the allowed deviation from pProfitValue.

The result is a populated table with sequence IDs and average profit values.

1а. What are the other criteria - it doesn't matter. The important thing is that the calculation is performed by a series of trades of the specified length.

1б. How can you discard a sequence just because it has a bad criterion value at the time of closing trade N? What if it becomes a better one afterwards?
You can only remove those sequences that have failed completely and whose criterion hasn't shown any profit. And there shouldn't be many of them.

elugovoy:

4. As I see it, if we are looking at strategy selection, this operation should not be performed too often (say, on every bar or immediately before the order opening). This approach is reasonable if the current strategy shows N losing trades in a row - then we can choose another one and it will take time to "make a decision", there is nothing to avoid. Or, carry out such selection once a week (on weekends, when the market is closed), and, or confirm the currently chosen strategy, or pass to another one. It is possible to make a list of optimal recommended strategies for a trader, in given conditions. Then with the Market opening and with a clear head (on Monday), the trader will confirm the choice (or earlier, before the Market opening... e-mail alert, etc.).

Well, it's a matter of ideology. Not about him now ;)

ALXIMIKS:

You allocate memory to an array of structures and get:

Why do you needan array of Criterion Value andan array of File Pointer Positions? (one Criterion and the last transaction have you not thought of storing?)

The Criterion Value array is for being able to sort and select a few best sequences (for the future).

File index positions - to continue searching in each sequence from the right place (how else?).

ALXIMIKS:

Did I get it right:

First pass - search on interval from 0 to SeekDate

then find the best criterion andFindDate = trade closing time + 1

search now on the interval from"Time of transaction closing" tothe SeekingDate ?

and you need to fit in that interval X trades to calculate the criterion for each sequence?

1. Yes, from 0 to the SeekingDate first

2. No. The SeekedDate is shifted, and we process the sequence (add trades to the array) on the interval "PreviousTreatedTrade - SeekDate".

Renat:

These are strange results.

Here's from our working server system under load:

  • SSD: 200 Mb per second, NTFS
  • with RAM: 2000-2500 Mb per sec, FAT32, Softperfect RAM Disk 3.4.5

Without disk frames, it takes many times longer to assemble projects.

I am starting to get complex.

Probably need to make a test script and attach a file for you to check on a similar task.

I have a normal hard drive - wdc wd1002FAEX-00Y9A0, judging by the specs, the maximum speed is 126 MB/s:

Judging from the review, that's about what you can squeeze out. Maybe I'm doing something wrong?

Let's have a look at the script...

ALXIMIKS:
That's what I'm talking about - you have to read big files in big chunks, otherwise small ones can take up to 10 or more times longer.

How do you read a big chunk?

papaklass:

In my opinion, the solution to the problem lies in coding the raw data.

How do we encode the full transaction information without losing information?

 
Renat:

Strange results.

Here's from our production server system under load:

  • with SSD: 200 Mb per second, NTFS
  • with RAM: 2000-2500 Mb per sec, FAT32, Softperfect RAM Disk 3.4.5

Without RAM disks it takes many times longer to build projects.

I forgot to write about memory.

DDR3, 1333 MHz:

Softperfect RAM Disk 3.4.5, though I did NTFS (any difference?)

And another detail - the RAM disk is 12000 MB, and there is only 1-2 GB of free memory (for work).

Reason: