RE: Memory allocation strategy -- RPG400-L

After giving is some more thought and reading the posts I came to the conclusion that reading the file line by line is not really very good/performant. I now read the block by block (64k). Probably 99% of the files will not have more than one block, but if it is the case my program just reads in the next block of data. No problem there.

I don't see the big advantage of using mmap when only reading data from a file. I would also need to allocate storage for the buffer. I would also pass the data block by block (though I wouldn't have to call a read).

Thanx for your thoughts

Mihael

-----Original Message-----
From: rpg400-l-bounces@xxxxxxxxxxxx [mailto:rpg400-l-bounces@xxxxxxxxxxxx] On Behalf Of Scott Klement
Sent: Friday, January 07, 2011 4:35 PM
To: RPG programming on the IBM i / System i
Subject: Re: Memory allocation strategy

Hi Mihael,

I agree with Mark.

I don't understand why you'd allocate memory chunk-by-chunk when the
total size of the stream file is easily determined beforehand? Are you
translating it to/from a format like UTF-8 where the size might not match?

The way I understand it... if you call %realloc() the system will try
to extend your existing allocation if it can. I think, in most cases,
it will... but it's not guaranteed. Depends on the size you requested,
and what other data is stored nearby. If it can't extend the
allocation, it moves the data to a new allocation and returns the new
pointer -- which might be a performance issue if you're doing it
frequently on large allocations.

So, if you can easily get the size of the stream file beforehand (with
stat(), as Mark suggests), why not allocate it all at once?

The only thing I can think of... you're doing CCSID translation, so the
buffer in your program might not match. In that case, I'd try to use
the file size as a guide, but be ready to extend it if necessary.

mmap() is a tool that I've tried to use in previous projects, but
failed. Now I can't remember the details -- something to do with
limitations of what you can/can't do with mmap(). It might've simply
been a peculiarity of the project I was working on! Let us know if it
helps you...?

On 1/7/2011 8:17 AM, Mark S. Waterbury wrote:

Mihael:

See embedded remarks below.

Mark

> On 1/7/2011 7:21 AM, Schmidt, Mihael wrote:

Hi,

I am loading some data from a streamfile and I have no clue how big things can get.

The "stat" API will return the size of the streamfile. (See "Unix
type APIs in the InfoCenter.)

Strategy 1: I am allocating some memory (%alloc) and for every line extend the allocated memory (%realloc).
Strategy 2: I get one "big" block of memory and only do a reallocation if the data doesn't fit. (more work on my side)

Doesn't the system do strategy 2 under the hood already? If I allocate 100 bytes the system probably allocates a page (or so) and not only 100 bytes. So a realloc should not trigger another allocation of a whole page.?!

You could easily write some small test programs to empirically determine
the "allocation stragegy" e.g. by reporting the addresses returned.

It is usually better to write your own memory management layer that
allocates large blocks from the OS, and then sub-allocates from within
these large blocks, using whatever custom allocation strategy is deemed
best for the job at hand.

Please enlighten me.

Instead of reading the entire streamfile into memory, use "mmap" (or
"mmap64") API to directly map the streamfile into teraspace. This allows
you to directly address the data in the file as a large array in memory,
and it is paged in on demand.

Thanx in advance.

Mihael Schmidt