Need help! Can't solve the problem, I'm hitting hardware limitations - page 2

 
ALXIMIKS:

I remembered a site where a similar problem and variants of its solution in C++ were discussed.

Thanks, I'll read it.

Ivan Ivanov:
I'm sorry, what if I try 64 bit or mt will only spin 32
I naively thought that such a highly mathematical thing should rotate on 64 bits
Take aerodynamics calculation software, they don't work on 32bit.
about the main argument that 32x is faster to run a computer I know, but it is a hardware problem imho

Switching to x64 just pushes the ceiling back, and not very far. I will not have to run out and buy 16Gb of memory. ;)

 
anonymous:

1. Naturally, use an x64 system.

2. rent a more powerful machine in the Amazon EC2 cloud and do the calculations on it.

3. use compressed data, decompress in memory on the fly. Real data is better compressed if you divide it into streams (sign/mantissa/exponent); you can use 12-bit float (at the expense of accuracy).

4: do an off-advisor calculation with something that can handle big data (Matlab/R/etc).

1,2: this is about pushing back the ceiling, and you want to solve the problem without being bound to a specific figure.

3. The problem is not the amount of data on disk, but rather the amount in memory. I can compress it by another 10-20%, but again, that won't solve the problem.

4. I'm holding out hope to stay in the sandbox for now. So that I won't have to write copiers/synchronizers later...

Thanks for your participation!

 
komposter:
Switching to x64 would just push the ceiling back, and not very far. I don't have to run out and buy another 16 Gb of memory, do I? ;)

You don't work with this kind of data all the time, do you? Write for x64 and run it on amazon when needed. You can also debug there on micro instance.

If, however, you face this problem all the time - you can buy 64 GB memory for about $1k, e.g. Corsair Vengeance Pro CMY64GX3M8A2133C11.

Either rethink the algorithm's design so that it can work in a single pass over the data

p.s. You can also store compressed data in memory, and uncompress as needed for enough time to process it

 
komposter:

Thank you, I will read it.

Switching to x64 will just push the ceiling back, and not very far. I'm not running out to buy another 16GB of memory, am I? ;)

You gotta be kidding me :-)
i'm a dummy with 8gig to play with
 

Option 1: Cut the file into pieces.

Option 2: Cut the file into pieces, but also systematise it. Like a word in the dictionary. Starts with "A" and searches for "A.txt". This way you can arrange data in tree form (similar to dictionary: folders A, B... in folder A folders AA, AB, etc.), the search will be very fast.

 
komposter:

So you'll have to read a lot of times, and that's:

  • very, very slowly.
  • will wipe a hole in the drive.

virtual RAM disk to the rescue ;)

and you won't have a hole, and you'll like the speed.

and the whole volume is available at once.

do not cut into pieces, because pieces are not suitable for the task

 
I would try to cut the file into chunks and load each chunk as needed (i.e. as Dima suggests). It's hard to say exactly, as it depends on the specific task. But the problem is interesting, keep me posted on your findings.
 
komposter:

1. this is the cache... Or I don't understand what you mean. My option of constantly reading the necessary chunks?

Well... read the file through its wrapper, the wrapper will keep a small part of the file in memory and substitute without reading. I mean you know how the file is used so the wrapper should turn out quite efficient.

komposter:

Oh shit...

Same eggs, only from the side. Reading might speed up, but it doesn't solve the problem globally.

Well I was thinking repetitive actions within a small scale.

The use of mapping is using wind's cache instead of writing your own. Load the chunk, read it, unload it. If the chunk is used often, winds will keep it in memory.

anonymous:

3. use compressed data, decompress on the fly. Real data is better compressed if you divide it into streams (sign/mantissa/exponent); you can use 12-bit float (at the expense of accuracy).

4. do an off-advisor calculation with something that can handle big data (Matlab/R/etc).

Or so (c).
 
Without knowing the specifics of the data structure and the operations to be performed, it is only possible to give general advice. One of the options is to convert raw data into metadata of smaller size - the same 4 Gb - in one or several passes (but without wiping the disk) and then work with metadata (aggregate values, cuts by some parameter, etc.). If this does not work, then load the data into DBMS.
 
komposter:

There is a large amount of information (about 20 GB in a text file).

...

And if this file is compressed with an archiver, how big is it (because the text should be very well compressed)?

Reason: