How to find big data in a compressed format? - General

Dmitry Fedoseev 2014.08.18 09:15 #41

What you are looking for can be compressed and searched in compressed format. This will reduce the amount of data and allow you to keep all the data in RAM. (theoretically)

Yuriy Zaytsev 2014.08.18 09:17 #42

Integer:

What you are looking for can be compressed and searched in compressed format. This will reduce the amount of data and allow you to keep all the data in RAM. (theoretically)

imo - if you want to search in compressed form - you must first uncompress - theoretically, you must write an algorithm for data search in compressed form - go nuts

and the problem is - mt4 (32 bit) can't hold much in RAM

---

it is logical to store a large volume in a database and use queries to get ready calculated data

I can think of no better way to apply ready-made solutions than to write my own data processing mechanism.

SQL is very good for storing big data although 20 gigs isn't much for SQL...

--

You may write your own mechanism and read a given file in parts and allocate maximal amount of memory for a read fragment to speed up

and several fragments from 20 gigs will need to be read to make one calculation cycle

the question is whether it will be faster than the option of data loading into the database and processing via SQL queries - i think not.

i think it would be almost faster to feed the data into SQL

Working with Databases An effective trading strategy MT5 RAM memory voraciousness,

Dmitry Fedoseev 2014.08.18 09:19 #43

YuraZ:

ichmo - to search in compressed - you have to uncompress first

Not necessarily. You can compress what you're looking for. There you go! :)

Nikolay Likhovid 2014.08.18 09:29 #44

The healthiest solution would of course be to change the algorithm. But since it is unknown, there is nothing concrete to suggest here. General thoughts may not be at all of course.

For example, since multiple file reads are required, these reads could occur in an internal "loop". One could try to "transfer" reading to the outer "loop" itself - the quotes are used because in general such a "transfer" might mean creating a completely new algorithm from scratch :).

Another clue arises from the mention of sequential reads with a shift - some iterative algorithm requiring read only "shift" could solve this problem. But then again, that could mean creating a completely different algorithm from scratch.

Or maybe this isn't about that at all :)

Edit a specific line Discussion on the implementation What's the word on

Yuriy Zaytsev 2014.08.18 09:30 #45

Integer:
It is not necessary. You can compress what you are looking for. There you go! :)

---

There is a large amount of information (about 20 GB in a text file).

The information consists of the same kind of sequences, about a million of them.

It is necessary to go through all the sequencesrepeatedly and make some calculations.

---

from 20 gigs, you have to feed the data into the algorithm.

it's not looking - it has data in the form of a file - which is used in the algorithm - fed to the calculation algorithm

Machine learning in trading: There is no point ZigZags, waves, trends.

Yuriy Zaytsev 2014.08.18 09:32 #46

Candid:

The healthiest solution would of course be to change the algorithm. But since it is unknown, there is nothing concrete to suggest here. General thoughts may not be at all of course.

For example, since multiple file reads are required, these reads could occur in an internal "loop". One could try to "transfer" reading to the outer "loop" itself - the quotes are used because in general such a "transfer" might mean creating a completely new algorithm from scratch :).

Another clue arises from the mention of sequential reads with a shift - some iterative algorithm requiring read only "shift" could solve this problem. But then again, that could mean creating a completely different algorithm from scratch.

Or maybe this isn't about that at all :)

it is logical to put the algorithm with large amount of data into SQL server

Dmitry Fedoseev 2014.08.18 09:33 #47

YuraZ:

---

There is a large amount of information (about 20 GB in a text file).

The information consists of the same type of sequences, about a million of them.

It is necessary to go through all the sequencesrepeatedly and make some calculations.

---

from 20 gigabytes, you have to feed the data into the algorithm.

it's not looking - it has a database - which is used in the algorithm

Just a conspiracy. Of course it could be anything, but I'm guessing it's looking. I'm even guessing what.

Eugeniy Lugovoy 2014.08.18 09:41 #48

Integer:
Not necessary. You can compress what you're looking for. There you go! :)

You amaze me, my dear))))

What algorithm shall we use for compression? Huffman, Lempel-Ziv?

Well, it will give you 4-8 times the compression for a text writer. Consider the fact that the compression algorithms create different recoding trees for each file.

In other words, the source file will have one tree, and the portion of data to be found will have another tree.

It's just interesting, how do you propose to search for it, even if theoretically ))))

Data compression is nothing but coding. If we make an analogy with encryption we get two different messages (compressed data) encrypted with different keys (recoding trees).

It's not even possible to match them in any way, let alone a search function )))

Machine learning in trading: OOP vs procedural programming Classes for creating panels

Dmitry Fedoseev 2014.08.18 09:45 #49

elugovoy:

You amaze me, my dear))))

What algorithm shall we use for compression? Huffman, Lempel-Ziv?

Well, it will give you 4-8 times the compression for a text writer. Consider the fact that the compression algorithms create different recoding trees for each file.

In other words, the source file will have one tree, and the portion of data to be found will have another tree.

It's just interesting, how do you propose to search, even if theoretically)))

Data compression is nothing but coding. If we make an analogy with encryption we get two different messages (compressed data) encrypted with different keys (recoding trees).

They are not even comparable in any way, let alone a search function )))

Oops and I'm struck by a lot of things here.

I think even a child should be able to understand it. If you compress some text with some algorithm, it will be exactly the same in a compressed form today and tomorrow too.

Are you saying that using the same compression algorithm and compressing two different texts you can get two completely identical data sequences?

Machine learning in trading: Trend of interesting thoughts Any questions from a

Yuriy Zaytsev 2014.08.18 09:45 #50

Integer:
Just a conspiracy. Of course, may be anything, but I assume that the search. I'm even guessing what.

>>> I don't know what I'm looking for ...

>>> You have to go through all the sequencesrepeatedly and do some calculations.

Well - yes - looking - but looking through 20 gigs...

Basically, the search is based on the fact that there's some sort of search and comparison.

I'm going by what the author wrote

Maybe the data can't be shrunk - compressed - indexed.

it is logical to put the data into SQL

pass the business logic to the server + data

the Expert Advisor will send only short data to the server for enumeration and calculation

and receive a ready answer.

Mining. Is it still Questions from Beginners MQL5 Discussion of article "Securing

Need help! Can't solve the problem, I'm hitting hardware limitations - page 5