Discussion of article "Handling ZIP Archives in Pure MQL5" - page 9

 
Forester #:
Downloaded and unzipped almost 300 files. And the data in them is getting bigger and has reached the size limit.
The file should have 1.8 billion char elements, but it is unpacked cut to 1.5 billion. Some data is lost. It is strange that it is cut so short, arrays can have up to 2147483647 elements.

I have sorted out the files (unpacked) exceeding a certain volume (for different files from 1.7GB to 2136507776 - i.e. almost to MAX_INT=2147483647, and arrays cannot have more elements) and which are cut off at the output. Turns out they are all marked as erroneous at:

CryptDecode(CRYPT_ARCH_ZIP, m_file_puck, key, file_array);

I.e. output value = 0.
But CZIP does not control this. I made zeroing of the output array size.
So in my functions I can determine with 100% guarantee that the file is successfully unpacked.
Before that I checked the correct end of JSON file }\r\n - but this solution is not universal and it seems that several files out of ~1000 were accidentally cut off by an intermediate line and were accepted as successfully decompressed, but the data in them is not complete.

New version of the function:

void CZipFile::GetUnpackFile(uchar &file_array[])
{
   static const uchar key[] = {1, 0, 0, 0};
   if(m_header.comp_method)
   {
      int dSize=CryptDecode(CRYPT_ARCH_ZIP, m_file_puck, key, file_array);
      if(dSize==0){Print("Err in CriptDecode. Arr size: ",ArraySize(file_array));ArrayResize(file_array,0);}//reset size to 0
   }
   else
   {
      ArrayCopy(file_array, m_file_puck);
   }
}

The new one is highlighted inyellow .

Perhaps the developers should also reset the array to zero, because the trimmed data is hardly needed by anyone. And may lead to hard-to-see errors.

 
Forester #:

I sorted out the files (unpacked) exceeding a certain volume (for different files from 1.7Gb to 2136507776 - i.e. almost to MAX_INT=2147483647, and arrays can't have more elements) and which are cut off at the output. It turned out that all of them were marked as erroneous at:

I.e. output value = 0.
But CZIP does not control this. I made zeroing of the output array size.
So in my functions I can determine with 100% guarantee that the file is successfully unpacked.
Before that I checked the correct end of JSON file }\r\n - but this solution is not universal and it seems that several files out of ~1000 were accidentally cut off by an intermediate line and were accepted as successfully decompressed, but the data in them is not complete.

New version of the function:

The new one is highlightedin yellow .

Perhaps the developers should also reset the array to zero, because the trimmed data is hardly needed by anyone. And may lead to hard-to-see errors.

Firstly, the return value 0 is not a 100% error sign (this is probably left over from the times when the function was used only for encryption, but not compression - it should probably be changed, but they are unlikely to do so, so as not to damage backwards compatibility) for unzipping zip, because zip purely technically allows to archive empty data ("receiver array" will be empty legitimately, without any error).

Secondly, the presence of partially unpacked data may be useful for error diagnosis and therefore it is better to leave them.

 
Stanislav Korotky #:

Firstly, a return value of 0 is not 100% indicative of an error

With my ~1000 files, the label with 0 - allowed me to discard all the files I searched by the end-of-file pattern (i.e. it worked 100%) and another 5 pieces that were apparently clipped by the end of the line.
So it's more reliable than my way.

Secondly, the presence of partially unpacked data may be useful for diagnosing errors and therefore it is better to leave them.

In my case there are about 5 cut files left, which ended the same way as the expected end of the file and which cannot be checked in any other way - i.e. this is not error diagnostics, but error skipping in the working algorithm.
Your options of error diagnostics?

However, I have solved the question for myself. But the errors were skipped until I figured out the code of someone else. If there was zeroing on the MT side, they would not be there.

In general, there are arguments for and against. Developers will decide what is better/logical.

PS. You can add -1 for zip error with 0 length file.
 
Forester #:

In my case, there are just about 5 clipped files that ended just like the expected end of the file and can no longer be checked in any way - i.e. this is not error diagnosis, but error skipping in the working algorithm.

What are your options for error diagnosis?

However, I have solved the question for myself. But the errors were skipped until I figured out someone else's code. If there was a reset on the MT side, they would not be there.

I didn't understand how errors could be skipped. The function returned not zero and a filled array with partially missing data from the archive? Then this is a bug in MQL5 - it should be corrected there.

Has the _LastError flag been checked?

The "expected end of file" should never be set, because archiving is a generalised thing - we may not know the format of the received file.

 
Stanislav Korotky #:

I didn't realise how errors could have been missed.

This is a flaw in the CZIP library from the article. There was no check for the result of unpacking. It was just

CryptDecode(CRYPT_ARCH_ZIP, m_file_puck, key, file_array);

Just today I found this and added this check see highlighted in yellow.

int dSize=CryptDecode(CRYPT_ARCH_ZIP, m_file_puck, key, file_array);
if(dSize==0){Print("Err in CriptDecode. Arr size: ",ArraySize(file_array));ArrayResize(file_array,0);}//reset size to 0

At the beginning of using it, I didn't realise there was no check - that's why I made up my check for the expected end of file.

The "expected end of file" should never be set as archiving is a generalised thing - we may not know the format of the file we are receiving.

I agree, that's why I made zeroing of the array in case of an error.
This is the most universal thing that can be done, except for the case with 0 file size.
But even with 0-th file size I will not process/parse JSON and nobody will - just loops on elements will not be started if the number of elements=0, so zeroing the array, in my opinion, is a good solution.

 
Attached is a small modification of the library code for the new realities (b5223+) of MQL5.
Новая версия платформы MetaTrader 5 build 5200: расширение OpenBLAS и усиление контроля в MQL5
Новая версия платформы MetaTrader 5 build 5200: расширение OpenBLAS и усиление контроля в MQL5
  • 2025.08.20
  • www.mql5.com
В пятницу 1 августа 2025 года будет выпущена обновленная версия платформы MetaTrader 5...
Files:
ZipHeader.mqh  27 kb
 
Forester #:

I came across an archive that CZip could not decompress. At the same time 7ZIP and Windows archiver decompress without problems.
Printed out the size of the compressed data - it turned out to be tens of megabytes less than the archive (and there is only 1 file in it).

I started looking for where it is calculated, and found it in the FindZipFileSize() function.

Experimented...
It turned out that if you return all end_size data as data size, the archive is unpacked correctly. Apparently, when unpacking, the code itself determines the end of data, instead of relying on the response from this function. The main thing is that it should not be smaller. You could leave it like that, but it turns out that the function is useless, which is unlikely. And perhaps some other archives will fail...
One more experiment showed that if you comment out the lines

The archive starts unpacking too. The amount of data is close to 100%.

It turns out that the archive has uint cdheader = 0x504b0102; and this is a part of the compressed data, not a label of its end.

Did you make a mistake with the label? I found such a label in the Internet search. Maybe it should be processed in some other way, instead of cutting data by it, I cut 30MB.

Function working with this file: (file \Include\Zip\Zip.mqh).

I can send you the archive file in a private message, if you are interested in figuring it out.

The thing is that there is a format Zip, which is precisely regulated and described, and there are various packers (windows, total commander, 7zip, etc.), which bolt on this standard and each is very creative in filling header structures. So CZip can't rely on the format being filled correctly, and calculates what it can on its own. What is there with identifier 0x504b0102 must be sorted out.

We should revise the code again and roll out a working update that takes into account your valuable comments. I'm glad someone is using the lib.

 
fxsaber #:
Attached is a small modification of the library code for the new realities (b5223+) of MQL5.

Thank you. MQL5 is more and more senseless and ruthless.

99% of errors could have been avoided if MetaQuotes had finally implemented standard reflection and serialisation functions.

 
Forester #:

I came across an archive that CZip could not unpack. However, 7ZIP and the Windows archiver unpacked it without problems.

...

I can send the archive file to you in a private message if you are interested to understand it.

Please send the archive file to me in a private message.

 
Vasiliy Sokolov #:

Please send me the archive file in a private message.

https://quote-saver.bycsi.com/orderbook/linear/BTCUSDT/2023-01-18_BTCUSDT_ob500.data.zip

here in the archive body (not in the header) there is cdheader = 0x504b0102;

In the next by date file too. I think it is often found.

header = 0x504b0304; is in every file in the header, ie the first 4 characters.
But it also met and in the body of the archive, rarely. I'll look for it now.

Here https://quote-saver.bycsi.com/orderbook/linear/BTCUSDT/2023-03-15_BTCUSDT_ob500.data.zip.


I think it is necessary to search for these headers only between the bodies of archives.