Need help! Can't solve the problem, I'm hitting hardware limitations - page 9

 
Urain:

to Komposter: Andrei, if you're stuck on the dimension problem, it means you've made a mistake in formulating the problem.

There are three options here:

1 think about it yourself

2 open the problem in a public forum

3 solve the problem in private (for everyone who you think can solve it and trust to keep it secret).

Let me explain what I mean: if you save news, you can write thongs of the whole news, or you can do the typical phrases (compression), "account balance" becomes a 1, "account equity" becomes a 2, etc. Another variant of typical problem is desire to fill in data already sorted, for large dimensions this is death, it's easier to add to the end and do conditional sorting by indexes.

I think I understand what I mean when I say that there is an error in the task definition.

Well, you can make such a substitution in a good text editor. I suppose under all conditions (when speaking about news), the redundant information will be >40%.

However, it will not get rid of writing functionality for processing text data. If the volume problem is solved, the performance problem may creep up.

And in general the problem statement is not fully solved, it's a fact... not much data, but more options ))

 

That's not what we're talking about here.

 
YuraZ:

>>> Talk about comparing two compressed sequences.

unique element since it has no sequence is not compressed, the search fails

Check - in practice

example - compress RAR two files

ELEMENT - 08:01:

AND SEQUENCE

08:01:33.
08:01:38.
08:01:38.
08:01:39.
08:01:40.
08:01:49.
08:01:57.
08:16:53.
08:16:59.
10:09:28.
10:09:29.
10:09:29.
10:09:29.
10:09:30.
10:32:23.
10:32:24.
10:56:11.
10:56:12.
10:56:12.
10:56:12.
10:56:13.
10:56:39.
10:56:39.
10:56:39.
10:56:48.
10:56:48.
10:56:48.
10:57:03.
10:57:04.
10:57:04.
10:57:07.
10:57:07.
10:57:07.
10:57:51.
10:57:52.
11:44:50.
11:44:52.
11:44:52.
11:44:52.
11:44:53.
12:57:35.
12:57:46.
12:57:46.
12:57:46.
14:01:41.
14:01:46.
14:20:06.
14:20:08.



---

Open both RAR files in a HEX editor and try to find

You will find that the element 08:01: is not compressed

and it won't match what's in the second file.


If you compress each entry - each column separately - you will not get a size gain

On the contrary, you will increase the data volume at the expense of the archiver's control data

 
YuraZ:

Check it out - in practice

example - RAR compress two files

ELEMENT - 08:01:

AND SEQUENCE

08:01:33.
08:01:38.
08:01:38.
08:01:39.
08:01:40.
08:01:49.
08:01:57.
08:16:53.
08:16:59.
10:09:28.
10:09:29.
10:09:29.
10:09:29.
10:09:30.
10:32:23.
10:32:24.
10:56:11.
10:56:12.
10:56:12.
10:56:12.
10:56:13.
10:56:39.
10:56:39.
10:56:39.
10:56:48.
10:56:48.
10:56:48.
10:57:03.
10:57:04.
10:57:04.
10:57:07.
10:57:07.
10:57:07.
10:57:51.
10:57:52.
11:44:50.
11:44:52.
11:44:52.
11:44:52.
11:44:53.
12:57:35.
12:57:46.08:01:
12:57:46.
12:57:46.
14:01:41.
14:01:46.
14:20:06.
14:20:08.



---

Open both RAR files in a HEX editor and try to find

You will find that element 08:01: is not compressed

and it won't match in any way.


but if you compress each entry - each column separately - you won't gain in size

on the contrary you will increase the amount of data

Each record has to be compressed individually. Of course, it only makes sense when the records are big enough to compress them.

 
Integer:

Each record has to be compressed individually. Naturally, this will only make sense if the recordings are large enough to really compress them.

So, we have come to the point where we have to work in blocks. Accordingly, it will be impossible to find a piece of information which does not correspond to a block in size.
 
Contender:
Well, we have come to the conclusion that we must work in blocks. Accordingly, it will be impossible to find a piece of information that does not correspond to the size of a block.

Why is that? From what? What's the problem? Didn't come to that.

 
Integer:

Why is that? From what? What's the problem? Didn't come to that.

And he's still talking about balls with Mischka :)
 
Integer:

Each record has to be compressed individually. Naturally, it only makes sense if the recordings are large enough to really compress them.

Dimitri - if you compress each record - a line

You will increase the volume - check it!


201 a1.rar -- compressed aaaa1 I can't say it's compressed, but it's compressed.

it was 535 aaaa1

became 77 a2.rar one element is compressed - or rather it is not compressed ... but in the file + control bytes

was 8 aaaa2

---

the data volume will increase from 20 gigs to the size of 20 gigs search element + velocity bytes

What's the point then?

 
YuraZ:

Dimitri - if you compress each record - a line

You will increase the volume - check it!


201 a1.rar -- compressed aaaa1 I can't say it's compressed, but it's compressed.

535 aaaa1

77 a2.rar one element is compressed - or rather it is not compressed ... but in the file + control bytes

8 aaaa2

---

the data volume will increase from 20 gigs to the size of 20 gigs search element + management bytes

What's the point then?

Yes I know that if the data is short, the archiving increases the size.
 
Integer:
Yes, I know that if the data is short, it increases in size when archiving.

Mm-hmm - that's why the one you suggested is no good

Reason: